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ABSTRACT 


Predicting  future  values  of  a  time  series  has  many  practical  uses  in  real-time  signal 
processing  and  understanding.  This  thesis  implements  an  Adaptive  Time  Delay  Neural 
Netwoiic  (ATNN)  capable  of  user-defined  degeneration  to  the  more  common  Time  Delay 
Neural  Networic  (TDNN).  Time  delays  along  axons  or  at  the  synapses,  which  vary  in 
biological  systems,  motivate  this  research.  The  ATNN/TDNN  test  results  and  time  series 
prediction  capabilities  are  compared  to  those  of  the  Real-Time  Recurrent  Learning 
(RTRL)  algorithm.  To  show  the  advantages  and  disadvantages  of  using  TDNN  and 
ATNN  for  prediction  versus  the  RTRL,  the  networks  were  applied  to  two  problems: 
incommensurate  sum  of  sine  waves  and  financial  time  series.  These  data  sets  represent 
examples  of  nonlinear  data  with  known  and  unknown  mathematical  functions, 
respectively.  Although  the  RTRL  predicted  better  than  the  ATNN  for  a  known 
predictable  function,  this  ATNN  approach  proved  competitive  in  determining  the 
direction  of  future  values  for  this  function  and  even  outperforms  the  RTRL  on  the  more 
difficult  prediction  task.  The  ATNN  program,  developed  in  C++  with  an  object-oriented 
fhunework,  also  takes  much  less  computation  time  than  the  RTRL  during  training. 


vui 


PREDICTING  NONLINEAR  TIME  SERIES 

I.  INTRODUCTION 

Most  solutions  to  life's  real  world  problems  involve  some  sort  of  function  prediction. 
Examining  real  world  processes  shows  nature  very  rarely  produces  simple  linear,  easily 
mathematically  modeled,  functions.  Many  functions  of  practical  interest  even  the  basic 
sine  wave  are  nonlinear.  Humans  possess  an  innate  ability  to  assimilate  information,  make 
some  sense  of  the  information  (probably  in  some  nonlinear  way  of  which  little  or  nothing 
is  known),  and  predict  some  useful  outcome  or  required  action.  People  constantly 
perform  prediction  involving  trajectories,  like  catching  a  ball  or  avoiding  a  collision  with 
other  vehicles  while  driving,  as  background  tasks.  The  human  brain  solves  these  real 
world  problems  easily,  and  in  time  to  make  the  prediction  useful,  because  they  involve  a 
relatively  low  order  of  dimensionality.  A  trajectory  usually  involves  at  most  three 
dimensional  time-space  relationslups.  Nonlinear  time  series  functions  present  more 
difficult  real  world  problems,  though.  The  dimensionality  quickly  becomes  intractable 
even  for  the  human  brain.  Many,  seemingly  extraneous,  factors  affect  these  functions 
forcing  them  towards  unpredictability.  Given  a  complex  function  separated  into  a 
superposition  of  more  simple,  predictable  functions,  the  human  brain's  ability  to  predict 
completely  diminishes  when  the  complex  time  series  function  involves  more  than  two 
sinusoidal  functions  with  an  irrational  ratio  of  their  frequencies  (i.e.,  non  periodic  time 
series  functions)  [15].  People  dont  like  to  (actively)  predict  [7],  especially  these 
unfamiliar  complex  functions.  For  this  reason  almost  all  scientific  disciplines  pursue 
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mathematical  or  computer  models  to  accurately  predict  the  behavior  of  complex  nonlinear 
processes  inherent  to  their  work. 

Engineers  trying  to  model  real  world  processes,  usually  approximate  these  non¬ 
linear  processes  as  a  superposition  of  linearly  separable  functions.  These  nonlinear 
processes,  or  functions,  are  broken  into  as  few  linear  parts  as  possible  to  still  obtain 
acceptable  results.  By  the  time  en^eering  approximations  are  made,  too  much 
information  is  sometimes  thrown  away  and  some  engineers  and  scientists  feel  they  must 
keep  all  that's  left.  Modeling  of  this  type  usually  results  in  poor  representations  of  the 
original  functions  because  engineers  model  the  smooth  (nonlinear)  transitions  that  nature 
makes  by  too  few  of  these  linear  pieces.  Meteorologists  trying  to  predict  the  weather, 
economists  trying  to  predict  market  behavior,  and  physicists  attempting  to  track  turbulent 
flow  of  fluids,  all  using  ever  faster  supercomputers,  frequently  produce  unsatisfactory 
results  [5].  This  strategy  lasted  through  the  years,  probably,  because  it  was  the  best 
available. 

However,  for  today,  even  large  numbers  of  supercomputers  working  in  parallel  can 
not  accurately  reproduce  the  processing  caprd>tlities  of  an  incredible  3  pound  piece  of  meat 
called  the  human  brain.  This  biologically  motivated  processing  may  never  be  achievable 
by  man-made  machines,  but  the  ideas  behind  Artificial  Neural  Networks  (ANN)  offer 
some  hope.  They  possess,  as  might  be  imagined  the  brain  does,  the  ability  to  map 
relationships  in  a  nonlinear  way.  It  is  time  for  conventional  engineers  to  rethink  their  "old” 
strategies.  Today's  technologies  obtain  an  abundance  of  information  quickly  and  easily, 
and  current  machines  store  and  sort  this  historical  information  well.  It  is  the  information 
processing  techniques  that  seem  to  be  lacking.  Maybe  engineers  should  take  a  lesson  from 
Mother  Nature,  keeping  only  the  best  parts,  or  features,  of  the  information  like  in  Nature's 
rule:  Survival  of  the  Fittest.  Working  smarter  almost  always  produces  better  results 
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than  working  harder.  These  fittest  features  would  store  only  important  and  often  used 
information  for  later  processing  much  as  the  human  brain  uses  or  loses  memory. 

Human  brain  development  is  an  interesting  phenomenon.  Jerison  points  out. 

It's  estimated  that  it  took  around  one  million  years  to  make  the  human  brain  what  it  is 
today,  given  this  is  about  the  time  when  a  r^id  increase  in  brain  size  compared  to 
body  weight  began  in  the  hominid  lineage.  There  is  no  evidence  of  a  change  in  brain 
size  versus  body  weight  for  any  other  maitunals  within  the  past  five  million  years.  [6] 

The  modeling  of  brain  functions  does  not  have  to  start  at  the  beginning,  though.  Today's 
electronic,  chemical,  and  imaging  technology  allows  us  to  look  deeper  inside  the  human 
skull  and  into  the  brain.  The  human  brain,  most  people  think,  is  the  most  powerful 
information  processor  in  creation.  Therefore,  try  to  more  precisely  model  the  best.  In 
other  words,  if  one  models  and  strives  for  excellence,  he  is  destined  to  achieve  excellence 
[13],  or  better  still 

"...if  you  refuse  to  accept  arching  but  the  best,  you  very  often  get  it. " 

-  W.  Somerset  Maugham 

Consider  moving  up  to  a  better  model  of  the  real  world;  stop  accepting  methods  of 
the  past.  Today's  conventional  sensors  provide  enough  information  for  a  human  to  predict 
an  event  outcome  given  relatively  low  dimensionality,  but  automatic  real-time  prediction, 
using  today's  computer  architectures,  is  often  too  slow  and  computationally  expensive. 
Thus,  non  real-time  prediction,  in  most  applications,  equates  to  present  state  estimation, 
not  to  prediction.  Webster  [11]  defines 

to  predict  -  is  to  declare  in  advance;  esp.;  foretell  on  the  basis  of  observation, 
experience,  or  scientific  reason. 
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So,  this  thesis  defines  nonlinear  function  prediction  as  the  declaration  of  a  future  value  of 
the  givM  function,  based  on  that  function's  history. 

Work  performed  by  Capt  Randall  L.  Lindsey  [10]  shows  properly  trained  Recurrent 
Neural  Networks  are  successful  in  applications  involving  time  dependencies,  including 
fimction  prediction.  A  recently  completed  competition,  at  the  Santa  Fe  Institute,  also 
found  Recurrent  Neural  Networks  among  the  best,  but  a  Time  Delay  Neural  Network 
(TDNN)  algorithm  won  the  time  series  prediction  competition  [21].  Another  recent  paper 
discusses  an  Adaptive  Time-delay  Neural  N^work  (ATNN)  algorithm  for  prediction  [9]. 
This  seems  a  logical  improvement  over  the  TDNNs. 

LI  Problem 

This  thesis  focuses  on  solving  the  nonlinear  function  prediction  problem  by 
developing  an  ATNN  and  TDNN  program  in  C++.  For  various  types  of  input  data,  the 
results  show  the  comparison  of  these  networks  with  the  subgrouped  RTRL  in  terms  of 
testing  and  training  accuracy  versus  training  time. 

1.2  Background 

Neural  network  publications  span  multiple  disciplines;  neurobiology,  physics, 
psychology,  medical  science,  mathematics,  computer  science,  and  engineering.  Since  its 
revival  in  the  mid  1980's,  neural  network  technology  has  been  changing  too  rapidly  to 
include  a  complete  summaiy  of  it  here.  Searching  current  databases  for  a  more  narrow 
review  including  the  just  a  few  important  features  for  nonlinear  function  prediction  yields  a 
manageable  review.  This  Literature  Review,  as  presented  in  Chapter  II,  highlights  the 
following  features;  Real-Time  Recurrent  Learrang  (RTRL)  algorithms,  subgrouped 
RTRL,  TDNNs,  and  ATNNs. 
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ANN  technology  continues  to  improve  by  leaps  and  bounds.  Meanwhile  researchers 
and  engineers  apply  the  state-of-the-art  algorithms  to  solve  time-series  prediction 
problems.  The  ANNs,  above,  when  properly  trained,  solve  many  time  series  tasks  and  are 
useful  applications. 

1.3  Assumptions 

This  thesis  assumes  the  input  vectors  are  actual  past  time  samples  of  the  process 
under  study.  The  only  input  feature  available  to  the  networks  will  be  this  historical  time 
sequence  data  of  the  process  itself  Therefore,  feature  extraction  from  an  input  pattern  is 
not  addressed  here,  but  successful  selection  of  the  ANN  parameters  during  the  training 
and  testing  periods  is  essential  for  each  algorithm.  The  processes  are  assumed  to  be 
predictable;  that  is,  they  are  not  purely  random. 

1.4  Scope 

This  thesis  will  present  an  Adaptive  Time-delay  Neural  Network  for  time  series 
prediction  of  complex  functions  and  compare  the  results  to  the  TDNN  and  the  Real  Time 
Recurrent  Learning  (RTRL)  algorithm. 

1.5  Approach 

The  proposed  plan  is  to  create  a  nonlinear  function  prediction  capability  in  five 
steps.  First,  create  the  Adaptive  Time-delay  Neural  Network  program.  This  ATNN 
algorithm,  coded  in  an  object-oriented  C-h-  programming  framework,  configures  to  the 
more  conventional  fixed  Time-delay  Neural  Network  (TDNN)  or  Error-Backpropagation 
(BP)  algorithms  with  the  appropriate  user-defined  network  parameters.  Second, 
implement  a  RTRL  algorithm  to  solve  a  few  specific  time  series  function  problems. 
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Third,  modify  the  ATNN  into  a  TDNN  and  train  then  test  for  solving  time  series 
prediction  with  fixed  time  delays.  Fourth,  train  and  test  the  fuU  ATNN  algorithm  with 
adaptive  time-delays  as  well  as  weights  for  solving  the  prediction  problem.  Finally, 
compare  all  the  algorithms  in  terms  of  training  and  testing  accuracy  versus  number  of 
training  cycles,  or  epochs. 

This  chapter  provides  a  brief  perspective,  the  goal  of  this  thesis,  and  then  outlines 
the  approach  to  stud^ng  this  prediction  problem.  The  next  chapter  will  review  some  of 
the  background  material  essential  to  understanding  the  current  ANNs  at  the  forefi-ont  of 
complex  function  prediction.  Chapter  III  develops  the  algorithm  for  the  ATNN  (and 
TDNN)  while  Chapter  IV  presents  results  of  applying  time  series  data  to  the  RTRL 
network  as  well  as  the  ATNN  and  TDNN.  Chapter  V  presents  conclusions  and 
recommends  direction  of  further  study  with  these  networks. 
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n.  LITERATURE  REVIEW 


2,1  Introduction 

This  literature  review  summarizes  the  current  state  of  artificial  neural  networks 
(ANN)  for  solving  time  dependent  processes.  Biological  neural  networks  quickly  and 
easily  process  temporal  information;  artificial  neural  networks  should  do  the  same. 
Modeled  from  biolo^cal  research,  artificial  neural  networks  provide  a  heuristic  approach 
to  solving  problems  that  could  prove  quite  successfiil  in  areas  of  speech  processing  and 
image  recognition  [8].  Of  the  many  known  varieties  of  ANNs,  literature  suggests  only  a 
few  of  the  major  classes  are  adequate  for  difficult  temporal  processing  and  prediction 
tasks.  One  class,  the  time  delay  neural  network  (TDNN),  incorporates  embedded  time 
delays  on  the  inputs.  Another,  known  as  the  recurrent  neural  network,  uses  time  delayed 
network  outputs  that  feedback  as  inputs  to  encode  and  learn  temporal  sequences.  Time 
dependent  processes  govern  much  of  the  real  world.  Thus,  properly  trained  artificial 
neural  networks,  both  TDNN  and  recurrent,  could  prove  veiy  successful  in  applications 
involving  time  dependencies. 

Publications  in  the  field  of  neural  networks  span  all  the  disciplines  of  science; 
neurobiology,  physics,  psychology,  medical  science,  mathematics,  computer  science,  and 
engineering.  As  such,  a  thorough  summary  of  neural  network  technology  would  contain 
numerous  volumes.  However,  a  sampling  of  current  literature,  centered  on  the  topics  of 
time  series  prediction,  TDNNs,  and  recurrent  backpropagation  neural  networks,  yields  a 
more  focused  review. 

The  scope  of  this  review  focuses  on  current  literature  detailing  studies  of  nonlinear 
time  series  function  prediction.  In  particular,  an  even  more  narrow  focus  on  uses  of 
TDNNs  and  recurrent  backpropagation  neural  network  technology  reveals  a  tremendous 
effort  exists  to  solve  problems  of  this  nature.  This  review  contains  a  short  background  on 
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basic  neural  network  theory  to  aid  the  reado^s  comprehension.  In  addition,  this  review 
highli^ts  the  real-time  recurrent  learning  (RTRL)  algorithm,  subgrouped  RTRL,  TDNN 
algorithm,  and  finally,  an  Adaptive  Time  Delay  Nniral  Network  (ATNN).  These 
algorithms  summarize  improvements  in  neural  network  technology  as  applied  to  function 
prediction. 

2.2  Background 

Biological  concepts  motivate  the  application  of  artificial  neural  networics  to 
electronic  machines.  Artificial  neural  networks  attempt  to  copy  or  mimic  the  response  of  a 
true  biological  neuron,  the  most  basic  processing  element  of  the  brain  [14]. 

During  the  late  1950's,  Rosenblatt  invented  a  new  class  of  machines  offering,  as 
many  researchers  thought,  a  natural  and  powerful  model  of  machine  learning  [16].  This 
basic  model,  called  the  perceptron,  consists  of  an  array  of  input  sensory  nodes  randomly 
coimected  to  a  second  array  of  associative  nodes.  The  connections,  called  weights, 
randomly  range  from  -1  to  1.  Each  secondary  node  produces  an  output  when  activated 
by  enough  of  the  sensory  nodes  connected  to  it.  Sensory  nodes  capture  outside 
information  for  the  machine,  and  associative  nodes  input  the  information  to  the  machine. 

The  output,  or  response,  of  the  perceptron  equals  a  proportional  weighted  sum  of 
the  associative  nodes'  responses.  In  other  words,  if  xy  denotes  the  response  of  the  /th 
associative  node  and  wy  denotes  the  corresponding  connection  weight,  then  the  activation 
S  of  the  next  node  with  n  associative  input  nodes  is 

(21) 

<=1 

and  the  output,  or  response  R ,  of  this  next  node  is  given  as 
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Thus  for  a  positive  R,  the  stimulus  belongs  to  class  1,  and  for  a  negative  R,  the  stimulus 
belongs  to  class  2.  In  its  most  basic  form,  the  perceptron  simply  implements  a  linear 
decision  function  /(xj.  The  perceptron  learns  by  changing  the  connection  weights  to 
minimize  the  total  response  error.  The  difference  between  the  desired  output  and  the 
actual  computed  response  of  the  node  defines  the  nodal  error.  In  equation  form. 


where  eff  denotes  error  of  node  n,  and  dfi  denotes  the  desired  value  of  node  n.  Therefore, 
the  total  response  error  equals  the  summation  of  the  squared  nodal  errors  over  the  entire 
length  of  the  data  set  (epoch). 

In  most  applications,  a  difiTerentiable  function,  usually  the  logistic  squashing 
function  (sigmoid),  operates  on  the  output  of  the  network.  When  the  sigmoid  response, 
given  by 


fix)  = 


1 


(2.2) 


1— OUT  ^ 

+  e 

operates  on  the  input,  the  response  becomes  the  weighted  sum  of  the  inputs,  including  a 
bias  term  6.  Thus  the  resulting  output  beccmes 


(2.3) 


/=j 
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and  Figure  1  shows  a  typical  network  node  while  Figure  2  details  the  output  of  the 
sigmoid. 


Figure  1:  Rosenblatt*s  perceptron  model. 


stimulus 


Figure  2:  A  typical  sigmoid  function  with  a  -1. 

Many  other  architectures  propose  extensions  to  this  basic  concept  introduced  by 
Rosenblatt.  Multilayer  perceptrons,  feedforward  neural  netwoiks,  error-backpropi^tion 
networks,  and  recurrent  backpropagation  networks  name  just  a  few.  Error- 
backpropagation,  discovered  by  Werbos  in  1974  [22]  and  independently  by  Parker  in  1982 
[12],  refers  to  the  method  of  updating  the  interconnection  weights;  that  is  by  propagating 
backward  from  the  output  to  the  input  and  changing  each  connection  weight  minimizes  the 
total  error.  Feedforward  nwral  nets  refer  to  connection  schemes  where  the  direction  of 
information  flow  is  strictly  from  input  to  the  n^  layer  passing  forward  through  successive 
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layers  to  the  output.  Error  passes  backward  to  adjust  the  weights  during  training  but  does 
not  add  information  to  any  given  node.  Think  of  error-backpropagation  as  weight 
modification  rather  than  unit  activation.  In  otl^  words,  no  "information''  flows  (as  input) 
fix>m  higher-level  to  lower-level  nodes.  Mar^  texts  contain  the  derivation  of  the  error 
backpropagation  algorithm[14].  A  purely  feedforward  networic  would  only  react  to 
external  input.  Feedforward  artificial  neural  networks  typically  solve  recognition  problems 
by  separating  spatial  regions.  However,  they  can  also  "learn"  relationships  governing  the 
generation  of  a  time  series,  then  use  that  "knowledge"  to  predict  future  values  of  the  time 
series  [17].  Building  on  that  idea,  recent  time  delay  neural  networks  spatially  present  a 
fixed  time  "window"  of  the  sequence  to  tlw  network  inputs.  Thus,  the  TDNN 
architectures  utilize  a  fixed  number  of  the  actual  time  sequence  values  as  inputs  instead  of 
the  recurrent  network  architecture  udiere  time  delayed  network  outputs  feedback  to 
inputs. 

Feedback,  sometimes  called  internal  input,  endows  a  network  with  a  couple  of 
significant  features:  (a)  incorporates  multiple  time  scales  into  the  processing  nodes,  (b) 
processes  temporal  sequences  of  inputs  [3],  A  recurrently  connected  neural  network 
contains  feedback  loops  fi'om  previous  states  (timed  inputs)  as  well  as  the 
backpropagation.  The  next  sequentially  timed  input  uses  these  feedback  outputs  to  create 
a  predictive  output  at  time  t  +  1  based  on  the  currem  input  and  the  previous  output. 
Information  about  past  inputs  is  manifest  through  the  learning  modified  results  in  this 
previous  output.  As  with  the  input  vector,  the  feedback  connections  each  have  their  own 
adaptable  weights.  These  recurrent  weights  change,  through  backpropagation,  to 
minimize  the  total  error  over  the  epoch  length.  Figure  3  shows  a  general  layout  of  a 
recurrent  neural  network.  Notice  that  the  current  input  vector  at  time  t  includes  a  bias 
input  (always  equal  to  one),  the  external  inputs,  and  the  previous  network's  output.  A 
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brief  look  at  some  of  the  work  accomplished  by  applying  various  network  designs  to  time 
series  data  follows. 


external  input  data  recurrent  output  values 

Figure  3:  Basic  RTRL  ardiitccture»  with  two  outputs*  two  hidden  nodes*  and 
two  inputs  [10]. 


2.3  Real-Time  Recurrent  Learning  (RTRL) 

Williams  and  Zipser  describe  a  real-time  learning  algorithm  for  training  completely 
recurrent,  continually  updated  networks  to  learn  temporal  tasks  [23],  This  technique  uses 
uniform  starting  configurations  that  contain  no  previously  known  information  about  the 
temporal  nature  of  the  task.  It  presents  a  gradient-following  learning  algorithm  that  tracks 
the  total  network  error  along  a  trajectory  minimiang  this  total  error.  Its  two  prime 
advantages  include  not  requiring  a  precisely  defined  training  interval  and  operating  while 
the  system  is  running.  The  algorithm's  main  disadvantage  consists  of  requiring  nonlocal 
communication  during  training  making  it  computationally  intensive.  Yet,  this  algorithm, 
called  the  real-time  recurrent  learning  (RTRL)  algorithm,  allows  recurrently  connected 
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networics  to  leam  complex  tasks  that  require  the  retention  of  information  over  definite  or 
ind^nite  time  periods. 

2.4  Subgrouped  RTRL 

Wh^eas  the  previoudy  discussed  RTRL  algorithm  shows  great  power  and 
geiwrality,  its  disadvantage  (CPU  intensive)  presents  a  potential  problem.  Zipser 
addresses  this  problem  by  proposing  an  improved  technique  which  reduces  the  amount  of 
computation  time  required  by  RTRL  without  changes  in  the  network  connectivity  called 
networic  subgrouping  [24].  Subnets,  which  divide  the  original  netwoiic  for  the  purpose  of 
error  backpropagation,  leave  the  networic  undivided  for  forward  propagation  of 
activations.  This  means  that,  during  training,  subgroups  form  only  when  error  propagates 
backward  through  the  network's  connection  weights.  During  normal  feedforward 
propagation,  the  networic  remains  fully  connected.  This  subgrouped  RTRL  algorithm  is 
10  times  faster  when  compared  to  the  previous  RTRL  method  performing  a  specific 
learning  task  [24].  It  suffers,  however,  in  that  each  subgroup  now  has  less  memory  than 
the  ori^nal  RTRL.  Zipser  suggests  compensating  for  this  by  using  more  hidden  nodes. 

2. 5  Time  Deh^  Neural  Networks  (TDNN) 

TDNNs  have  recently  been  appUed  for  use  in  phoneme  classification  [18].  Figure  4 
shows  a  typical  TDNN  architecture  used  for  this  classification  problem.  Waibel  used  this 
network,  with  some  success,  for  the  identification  of  phonemes  in  Japanese.  However, 
work  done  in  conjunction  with  a  recent  competition  held  at  the  Santa  Fe  Institute  proves 
the  usefulness  of  TDNNs  for  solving  compile  time  series  prediction  tasks  [21],  A  Finite 
Impulse  Response  (FIR)  neural  network  [19],  equivalent  to  TDNN,  won  the  competition 
with  recurrent  networks  finishing  close  behind  [21]. 
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yl(t)  y2(t) 


f  f  f  f  f  f  f 


Bias  xl(t-2)  x2(t-2)  xl(M)  x2(t-l)  xl(t)  x2(t) 

Figure  4:  Typical  TDNN  network  where  input  data  is  shifted  along  inputs  to  the  net 

This  approach  models  the  biological  synapse  as  a  FIR  linear  filter.  The  article,  also, 
derives  a  temporal  generalization  of  the  familiar  error  backpropagation  algorithm.  This 
feedforward  TDNN  method  replaces  the  scalars  of  Equation  2.3  with  vectors  representing 
the  weighted  sum  of  delayed  samples  of  the  inputs.  The  response  becomes 

<=I 

where  Wy  specifies  the  weight  associated  with  the  output  of  node  i  to  the  input  of  node  j 
in  the  next  layer  and  k  is  the  discrete  time  indw.  The  temporal  backpropagation  algorithm 
is  similar  to  the  Adaptive  Time  Delay  Neural  Network  developed  in  Chapter  m.  This 
algorithm  allows  for  more  computational  efficiency  since  the  number  of  operations  grows 
linearly  with  the  number  of  layers  in  the  network. 
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2. 6  Adaptive  Time  Delay  Neural  Networks  (A  TNN) 

Another  article  presents  the  logical  n«ct  step  to  follow  the  TDNN  [9],  It  describes 
an  improved  learning  algorithm  based  on  gradient-descent  for  updating  the  time  delays  as 
well  as  the  syn^tic  weights.  The  delays  in  a  biological  system  occur  along  axons,  due  to 
factors  such  as  axon  length  and  insulation  (myelin  sheath),  and  at  the  synapses  due  to  the 
biochemical  processes.  This  adaptation  method  provides  more  flexibility  so  the  network 
can  attain  the  optimal  time  delays  associated  with  the  wdght  to  achieve  more  accuracy 
than  with  fixed  delays  determined  prior  to  training  or  by  trial-and-error.  As  a  result,  an 
ATNN  network  proves  slightly  better  than  TDNN  because  of  a  better  match  between 
choice  of  time  delay  values  and  the  tonporal  location  of  the  important  information  in  the 
input  patterns.  This  thesis  will  focus  on  the  development  of  €++  code  to  implement  this 
ATNN  algorithm  and  then  compare  the  performance  of  it  with  TDNN,  by  fixing  the 
weights  of  the  ATNN,  and  subgrouped  RTRL. 

2. 7  Summary 

As  technology  improves,  engineers  and  other  researchers  discover  new  and 
innovative  algorithms  for  solving  time-dependent  problems.  All  of  these  algorithms 
discussed  here  possess  the  ability,  if  property  truned,  to  tackle  and  solve  many  difficult 
temporal  tasks.  More  great  strides  in  advancing  neural  network  technology  require 
still  further  research.  Most  of  these  algorithms  rast  in  digital  software  form.  Some 
routines  cannot  be  implemented  in  contemporary  hardware.  Therefore,  further  research 
will  determine  whether  or  not  particular  networks  become  physical  hardware  elements, 
thus  greatly  increasing  their  speed  and  utility. 

A  theme  that  carries  throughout  this  thesis  concerns  time  and  the  necessity  for 
network  models  to  reflect  the  fundamental  and  essential  temporal  nature  of  actual  nervous 
systems.  Short-term  memory  allows  present  access  to  the  recent  past,  and  longer-term 
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memofy  relates  to  the  more  remote  past.  To  respond  to  temporal  processes,  the  nervous 
syston  may  require  a  temporal  representation.  If  so,  ANNs  should  also  be  capable  of 
representing  processes  extended  in  time.  Captain  Randall  Lindsey  investigated  RTRL 
algorithms  for  predicting  time  series  functions  [10].  Captain  Jef&Qr  Dean  modified 
Lindsey's  RTRL  code  for  the  subgrouped  RTRL  algorithm,  which  this  thesis  uses  due  to 
its  increased  performance.  The  next  chapter  develops  an  ATNN  algorithm  (and  TDNN 
algorithm  by  user  definitions)  and  code  for  comparison  to  the  prediction  abilities  of  the 
recurrent  netwoiics  such  as  subgrouped  RTRL. 
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III.  METHODOLOGY 


3.1  Introduction 

The  recent  works  covered  in  the  Literature  Review  pertaining  to  time  series  function 
prediction,  provides  a  broad  overview  of  the  types  of  artificial  neural  networks  most 
commonly  used  in  today's  prediction  research.  This  thesis  effort  encodes  the  Adaptive 
Time  Delay  Neural  Network  (ATNN).  From  this  code,  the  user  defines  either  an  ATNN, 
Time  Delay  Neural  Network  (TDNN)  or  a  Backpropagation  network  (BP)  as  desired. 
The  resulting  network  gets  tested  in  both  the  ATNN  and  TDNN  modes  for  accurate  time 
series  function  prediction.  As  stated  earlier,  a  previous  AFIT  thesis  by  Lindsey  encodes 
the  Real-Time  Recurrent  Learning  (RTRL)  algorithm  and  yet  another  AFIT  thesis,  by 
Capt  Jeffrey  Dean  currently  near  completion,  modifies  Lindsey's  code  extending  it  to  the 
subgrouped  RTRL  case.  Since  the  subgrouped  RTRL  algorithm  is  basically  the  same  as 
RTRL  but  is  less  computationally  intensive,  this  thesis  compares  subgrouped  RTRL  with 
the  ATNN  and  TDNN  prediction  schemes  developed  here. 

This  methodology  chapter  develops  the  ATNN  algorithm  for  performing  time  series 
function  prediction.  The  explanation  details  the  basic  theory  and  interpretation  of  the 
ATNN  algorithm  used  in  this  thesis.  In  addition,  this  chapter  discusses  how  to  use  the 
code  for  ATNN  learning  versus  TDNN  or  Error-Backpropagation  Network  learning  (the 
later  two  are  special  cases  of  the  more  general  ATNN  algorithm).  Finally,  this  chapter 
discusses  the  training  and  testing  procedures  and  applies  them  to  two  specific  problems. 

3.2  A  TNN  Algorithm  Development 

This  Adaptive  Time  Delay  Neural  Network  (ATNN)  generalizes  the  common  error- 
backpropagation,  gradient  descent  method  to  allow  for  adapting  of  time  delays  as  well  as 

the  weights  during  training.  Time  delays  allow  each  node  in  the  artificial  neural  network 
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to  more  closely  model  what  is  understood  of  true  biological  processes.  Studies  show  time 
delays  odst  in  the  biological  nervous  system  due  to  impulse  transmission,  caused  by  the 
varied  lengths  and  insulation  of  axons,  and  cell  membrane  excitation,  caused  by  the 
temporal  properties  at  synapses  [4],  Time  delay  network  nodes  take  into  account  not  only 
the  information  from  previous  layers  (as  in  standard  feedforword  networks),  but  they  also 
remember  some  of  the  past  information  in  the  delays  associated  with  each  interconnection. 
This  ATNN  algorithm  interpretation  provides  a  variable  length  buffer  (like  a  memory)  for 
each  input  feature  in  the  data  set  instead  of  the  spatio-temporal  separation  method 
described  for  TDNNs  in  Section  2.5  (Figure  4). 

3.2.1  Network  Architecture 

The  interconnection  scheme  for  ATNN  nodes  includes  n  connections  between 
previous  node  /  and  the  next  node  j.  Each  coimcction  contains  its  own  time  delay  and 
associated  weight.  Attempting  to  simplify  the  variable  indexing,  the  order  of  indices 
remains  constant  throughout  this  thesis.  The  first  index  corresponds  to  the  next  node 
which  the  cormection  goes  to,  the  second  index  corresponds  to  the  previous  node  firom 
which  the  connection  comes,  and  the  third  index  gives  the  particular  time  delay  connection 
between  the  two  nodes.  If  only  one  index  appears  on  a  variable,  y  relates  to  the  output  of 
the  next  node  and  i  relates  to  the  input  from  the  pre\dous  node.  In  their  article,  Lin  and 
others  [9]  call  this  interconnection  scheme  between  nodes  a  delay  block  as  shown  in 
Figure  5  where  and  Wy^„  are  the  independent  time  delays  and  weights  for  each  nth 
interconnection  of  the  block.  Summation  of  the  activations  into  each  node  occurs  the 
same  as  in  the  basic  Rosenblatt  perceptron  discussed  earlier.  However,  now  the 
activations  result  both  from  the  input  of  previous  nodes  and  the  n  time  delay 
interconnections  for  each  of  these  previous  nodes. 
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Figure  5:  Dday  block  for  ATNN  [9] 


Thus,  node  j  receives  the  total  activation  given  in  the  following  equation 

N  K 

<=0  t=i 

where  N  is  the  number  of  input  features  and  is  a  maximum  number  of  time  delay 
interconnections  between  nodes.  In  the  equation,  -t^)  is  the  activation  from  the 

output  of  a  previous  node  /  (or  an  external  input  sample)  delayed  by  tjn^  For  simplicity, 
each  node  in  the  network  then  processes  the  weighted  sum  through  the  same  sigmoid 
function  as  described  by  Equation  2.2. 

A  user  definition  file  completely  specifies  the  desired  ATNN  architecture.  This  file 
consists  of  basic  variables  used  in  most  artificial  neural  network  configurations  (i.e., 
numbers  of  input  (n),  hidden  (q),  and  output  ip)  nodes,  weights  update  learning  rate  (t);), 
maximiun  number  of  iterations  through  the  training  data  (epochs),  and  range  of  random 
initialization  values  for  matrices).  The  file  also  includes  the  ATNN  specific  variables  (i.e., 
time  delay  update  learning  rate  (t}^),  momentum,  maximum  number  of  time  delays  (K), 
epoch  learning  flag  -  to  update  after  each  epoch  instead  of  after  each  input  pattern, 
tolerance  -  user  defined  acceptable  diflference  between  desired  and  network  output,  time 
interval  between  data  points,  and  TDNN  learning  flag  -  for  fixing  time  delays). 
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The  encoded  network  structure  allows  for  only  one  layer  of  hidden  nodes,  thus, 
two  layers  of  weights  and  two  layers  of  time  delays.  This  ATNN  architecture  uses  the 
delay  block  idea  to  connect  between  any  two  nodes  and  the  nodes  fitlly  connect  from  one 
layer  to  every  node  in  the  next  layer.  This  might  not  be  biologically  correct  but  the 
learned  weight  values  in  the  ANN  give  relative  strength  to  these  interconnections.  Thus, 
the  unused  interconnections  obtain  a  very  low  weighting.  Each  connection  can  also  have 
its  own  independent  time  delay.  Figure  6  shows  a  two  layer  ATNN  example  (i.e.,  two 
layers  of  delay  blocks)  for  clarity. 


node  1 


input  1 


input! 


input  n 


INPUT 

FEATURES 

(or  TIME  PROCESSES) 


HIDDEN 

NODES 


OUTPUT 

NODES 


Figure  6:  Two  Layered  ATNN  [9] 


3.2.2  Learning  Algorithm 

The  derivation  of  the  ATNN  algorithm  presented  here  follows  that  of  the  article  by 
Lin  [9].  This  thesis  shows  only  the  most  important  equations  for  understanding  the 
algorithm.  Many  texts  contain  the  complete  derivation  of  Error-Backpropagation  weight 
update  rule  [14],  and  the  article  [9]  derives  the  time  delay  update  rule  as  found  in 
Appendix  A  for  completeness. 
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First  define  an  instantaneous  error  measure  (MSE)  by 


=  (3.2) 

jeP 


where  there  are  P  output  nodes  with  computed  values,  ajOf,},  and  dj(if^  denotes  the 
desired  output  of  node  j  at  time  In  this  thesis,  the  wdghts  and  time  delays  update  after 
each  input  pattern  according  to  the  respective  error  gradients 


dw^ 
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where  tJj  and  rj^  are  the  learning  rates. 
Let 
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(3.6) 


where  P  now  is  the  number  of  nodes  in  the  next  layer  all  terms  are  defined  as  before  and  AT 


is  the  maximum  number  of  time  delays.  Then  the  learning  rules  can  be  summarized  as 
follows: 


21 


(3.7) 

(3.8) 


Because  the  node  activation  varies  with  time,  the  error  gradioit  with  respect  to  time 
requires  calculation  of  this  derivative.  According  to  the  Mean  Value  theorem  of 
differential  calculus,  the  derivative  of  the  node  activation  in  Equation  3.8  approximates  to 


)  ^Ok-t ) 


2r 


ifV=0 


(3.9) 


where  r  is  the  time  step  between  data  points.  The  time  delays  are,  in  p  ^ral,  zero  or 
positive  real  numbers.  However,  here  the  update  is  based  on  time  delays  rounded  to  an 
integer  number  of  time  steps. 

An  example  of  how  to  interpret  these  learning  rules  will  be  helpful  for  the  un&miliar 
reader.  For  instance  substituting  Equation  3.6  into  Equation  3.8  and  dropping  subscripts, 
the  time  delay  update  rule  for  a  hidden  layer  simplifies  to 

=  n2l„I,J(d-o)a(l-a)fV2]h(l-h)m(ar 

The  above  reads  as  follows: 

the  (learning  rate)  times  the  sum  over  aU  (output  nodes)  and  (time  delays)  of 
[(desired  -  net  output)*(net  output)*(l-net  output)*(weights  in  layer  2)]  times 
(hidden  activation)  times  (1-  hidden  activation)  times  (weights  of  layer  1)  times 
(derivative  of  inpu  t  features). 

The  other  instances  of  ler.'  ring  rules  follow  a  similar  interpretation.  Given  this  overview 
of  the  ATNN  algorithm  itself^  the  next  section  explains  the  code  developed  for  this  thesis 
effort. 
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3.3  A TNN  Code  Development 

This  ATNN  code  uses  an  object-oriented  programming  (CX)P)  structure  in  C++. 
C++  and  OOP  are  becoming  the  standard  for  software  development.  The  C++  flexibility 
and  reusable  "objects"  provide  an  environment  to  more  ea^y  develop  and  implement 
various  artificial  noiral  networks.  Even  the  C  programming  language  does  not  yet  offer  a 
well-developed  tool  kit  for  implementing  artificial  neural  network  algorithms.  Adam  Blum 
developed  an  object-oriented  finmework,  on  which  this  ATNN  code  is  based,  for  building 
neural  networks  in  C++  [1].  C++  offers  all  the  advantages  of  C  but  also  supports  object- 
oriemed  programming  readily.  Programming  objects  and  using  them  to  build  a 
firework,  much  like  a  carpenter  uses  tools  to  build  a  fitime  house,  allows  for  more 
flexible  and  innovative  designs.  Most  of  tlM  applications  and  developments  in  Blum's 
book  deal  with  spatial  problem  solving.  This  theris  effort,  on  the  other  hand,  deigns  new 
methods  and  classes  for  representing  time  series  data  and  solving  time  dependent  process 
problems.  The  new  objects  allow  incorporation  of  3-dimensional  matrices,  buffered  inputs 
to  nodes,  propagation  and  backpropagation  through  time,  activation  derivatives,  and  time 
delay  optimization  learning.  A  new  class,  ATNN,  encompasses  all  these  methods  that 
operate  together  to  form  the  ATNN  algorithm.  The  ATNN  class  is  a  specialization  of  the 
net  class  developed  by  Blum.  Figure  7  shows  the  class  hierarchy  for  an  ATNN. 

Making  the  ATNN  code  executable  requires  several  definition  and  source  code  files 
for  the  various  classes  used  to  build  an  ATNN.  The  definition  files  include:  atim.h,  net.h, 
and  vecmat.h.  The  source  code  files  include;  testatnn.cc,  atnn.cc,  net.cc,  and  vecmat.cc. 
Complete  listings  for  the  required  files  to  compile  and  use  the  ATNN  program  are 
included  as  Appendix  C.  The  ATNN  code  was  developed  using  Turbo  C++  3.0  on  an 
flSM/compatible  486  system.  Although  it  successfully  compiles  using  Turbo  C++  3.0, 
training  is  quite  slow  on  this  single  processor  system.  For  that  reason,  the  ATNNs 
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described  in  this  thesis  were  trained  and  tested  u»ng  the  Silicon  Graphics  systems  (IRIS 
4D  and  ONYX). 

The  details  of  using  the  interactive  ATNN  program  are  contained  in  Appendix  B. 
The  program  is  interactive  in  that  h  allows  the  user  to  set  a  maximum  number  of  epochs  or 
target  minus  network  output  tolerance  for  determining  program  completion,  but  the 
training  process  may  be  suspended  (by  pres^g  the  ESC  key  on  the  Silicon  Graphics  or 
any  key  on  a  PC)  and  saved  at  any  point.  Subsequent  training  automatically  resumes  from 
tlw  stored  network  information  (located  in  tlw  associated  .WTS  file).  This  allows  for 
testing  at  different  points  during  the  training  process  which  helps  in  optimizing  training. 
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Figure?:  Class  hierarchy  for  ATNN 
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3.4  TDNN  and  BP  Implemeittaiion 

The  Time  Delay  Neural  Networic  and  Error-Backpropagation  algorithms  can  be 
viewed  as  special  cases  of  the  ATNN  algorithm.  Fixing  the  time  delays  (i.e.,  not  allowing 
them  to  update  and  learn),  while  still  allowing  the  wdghts  to  update,  results  in  a  netwoik 
equivalent  to  the  typical  TDNN.  By  fixing  the  time  delays  and  setting  the  maximum 
number  of  delays  to  one  (MAXNUMTAU  1  in  .  DEF  file),  the  Error-Backpropagation 
algorithm  (who-e  tjii^  =  0)  results.  In  this  ATNN  code,  the  automatically  buffered  inimts 
still  exist,  thus  each  input  feature  only  requires  one  input  node  for  the  TDNN  window  of 
delayed  inputs.  This  thesis  uses  only  the  TDNN  spedal  case.  The  networic  definition  file 
(.DEF  file)  sets  the  TDNN  condition  by  simply  setting  TDNN  1  for  the  fixed  time  delays 
case.  If  TDNN  0,  time  delays  will  update  and  learn  as  in  the  general  ATNN  case. 

1.5  RTRL  Implementation 

This  section  describes  how  another  type  artifidal  neural  network,  the  subgrouped 
Real-Time  Recurrent  Learning  (RTRL)  algorithm,  was  implemented  for  comparison  in  this 
thesis.  The  RECNET  code,  originally  written  by  Lindsey  and,  then,  modified  by  Dean,  is 
used  much  the  same  as  described  in  Appendix  A  of  Lindsey's  thesis  [10].  The 
"parameters.dat”  file  includes  more  network  definitions,  each  of  which  is  explained  in  a 
comments  section  of  the  file  itself,  due  to  Dean's  modifications.  The  data  file  structure 
remains  the  same  as  that  in  Lindsey's  code. 

The  RECNET  code  requires  a  different  data  format  than  the  ATNN  code.  The  data 
file  structure  for  using  RECNET  includes  on  the  top  line;  the  number  of  input  features, 
number  of  external  output  nodes,  the  total  number  of  nodes  (external  output  plus  hidden), 
and  the  number  of  input-output  pattern  pairs.  All  subsequent  lines  contain  the  input 
features  (#  columns  of  these  equates  to  the  first  number  in  the  top  line)  and  desired 
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outputs  (#  columns  of  these  equab  the  second  number  in  the  top  line)  ail  separated  only  by 
spaces.  Each  line  corre^nds  to  a  separate  timed  event  in  the  time  series  process. 

RECNET  stores  its  output  and  error  information  in  a  manner  such  that  each 
different  netwoiic  configuration  and  data  set  requires  its  own  directory.  Otherwise,  the 
resulting  networic  information  fiom  one  netwoiic  overwrites  the  previous  outfHit  files. 

3.6  Applications 

Time  series  function  prediction  was  attempted  for  two  specific  applications.  The 
first  application  chosen  was  a  sum  of  incommensurate  sine  waves.  This  means  the  ratio  of 
the  sine  wave  frequencies  is  an  irrational  number  resulting  in  a  nonperiodic  function.  In 
his  thesis.  Captain  James  Stright  provides  a  measure  of  the  randonmess  of  this  fimction 
called  the  fhictal  dimension  [17].  Fractal  dimension,  as  determined  by  the  Grassberger- 
Procaccia  method  used  by  Stright,  begins  at  1.0  for  a  stnught  line.  Very  smooth  curving 
data  has  a  low  firactal  dimemnon  and  more  "ragged”  data  has  a  higher  fractal  dimension.  . 
For  the  incommensurate  sine  wave  function  defined  by 

XO  =  [2  +  sin(V2/)  +  sin(‘,/3/)]/2,  (3.10) 

the  Grassberger-Procaccia  method  ^elds  a  fractal  cUmension  of  1.7.  Figure  8  shows  a 
graph  of  a  small  sample  oty(t)  from  equation  3.10.  A  higher  sample  rate  would  make  this 
data  smoother,  however  this  is  the  data  used  for  testing  and  training  here.  This  data 
should  be  of  low  enough  dimensionality  that  artificial  neural  network  prediction  is 
learnable  with  a  reasonable  number  of  nodes  and/or  time  delays.  Stright's  predictor 
network  obtained  an  average  MSE  of  0.349  from  the  instantaneous  MSE  of  the  points 
given  in  his  thesis.  This  result  though  was  from  a  multilayer  perceptron  implementation 
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Figure  8:  Sum  of  Two  Incommensurate  Sine  Wnves 

and  was  only  trained  for  1200  epochs  vith  the  previous  3  inputs  and  a  total  of  20  hidden 
nodes  (IS  in  one  layer  and  5  in  another).  Although  this  is  a  benchmark  for  this 
application,  only  the  three  networks  explicitly  discussed  in  this  thesis  will  be  compared. 
Chapter  TV  discusses  the  results  of  this  application  in  terms  of  the  function  prediction 
comparisons. 

The  second  application  considered  for  this  the«s  contains  data  even  more  complex 

than  the  incommensurate  sine  wave  data.  Data  with  fractal  dimension  between  1.95  and 

7.5  is  considered.  Prediction  of  nonlinear  time  series  functions  such  as  pilot  head  motion 

given  the  time  series  position  (in  x,  y,  z  3D  Euclidean  distance  space)  or  acceleration  data 

are  of  particular  importance  to  the  Air  Force.  These  type  functions  probably  have  an 

order  of  dimension  near  or  within  that  of  chaotic  data.  But  if  a  prediction  network 

predicts  future  values  of  chaotic  data  more  accurately  than  50  percent  of  the  time,  it  is 

beating  chance  and  should  perform  quite  well  on  head  motion  data.  Since  a  readily 

available  source  of  data  exists  in  the  area,  a  set  of  financial  data  (British  pound  opening 

price  data)  was  chosen  as  an  example  data  set  that  is  usually  considered  to  be  in  this 

category  of  chaotic  data.  Figure  9  shows  a  sample  of  the  data  set  used  for  this 
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^>plication.  It  is,  definitely,  a  good  ^cample  of  a  nonlinear,  dynamic  process  which  can 
not  be  predicted  consistently  and  accurately  by  the  human  brain.  The  equations  for  maiicet 
price  processes  are  currently  unknown,  but  m  fact  the  process  is  assumed  not  to  be 
random.  The  equations  are  most  likely  nonlinear,  stochastic,  delay-differential  equations 
because  of  market  response  to  certain  real-world  inputs.  [2] 

Comparison  of  similarly  configured  networks  and  a  discussion  of  the  abilities  of 
each  type  network  to  learn  these  application  tasks  is  presented  in  Chapter  IV. 

OPENING  PRICE  I 


Figure  9:  Financial  times  series  data 


3. 7  Training  and  Testing  the  Algorithms 

Each  of  the  data  sets  was  scaled  by  the  ATNN  code.  However,  normalizing  the 
data  (i.e.,  zero  mean  divided  by  the  standard  deviation)  prior  to  input  allows  the  networks 
to  learn  without  saturating  the  sigmoid  functions.  Thus,  the  network  prediction  should 
more  accurately  follow  the  desired  output.  Experience  shows  that  prediction  of  real  world 
processes  must  be  made  over  very  short  time  frames  because  the  causal  forces  driving 
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these  processes  change  so  rapidly.  For  that  reason,  this  thesis  only  attempts  to  predict  one 
time  step  ahead. 

Initially,  each  of  the  three  network  types  (i.e.,  RTRL,  TDNN,  and  ATNN)  trained 
on  each  of  the  applications.  Using  one  input,  one  output,  various  numbers  of  hidden 
nodes,  and  various  time  delays,  a  configuration  evolved,  for  each  training  set,  such  that  the 
networks  learn  to  predict  reasonably  well.  With  these  networic  configurations  established, 
training  commenced  to  a  point  where  the  error  dropped  considerably  and  then  leveled.  In 
the  RTRL  algorithm,  the  learning  rate  automatically  decays  by  a  &ctor  of  2  once  the  MSE 
levels.  Therefore,  the  ATNN  and  TDNN  training  runs  were  continued  in  the  same  manner 
through  the  interactive  program.  Testing  was  performed  at  several  intermediate  points  to 
ensure  the  networks  were  indeed  learning.  From  the  point  where  the  MSE  b^an  to  drop 
significantly,  the  networks  were  trained  only  a  few  iterations  at  a  time  with  testing  in 
between  the  training  runs.  The  goal  of  this  method  was  to  minimize  the  testing  set  error 
while  nuiintaining  a  low  training  set  error.  The  re^lt,  hopefully,  gives  an  optimized  set  of 
weights  and/or  time  delays  for  each  networic. 

3.8  Summary 

This  chapter  describes  the  methodology  for  developing,  training  and  testing  the 
ATNN  algorithm  and  code.  It  also  discusses  the  implementation  of  the  TDNN  (a  special 
case  of  the  ATNN)  and  the  subgrouped  RTRL  algorithms  to  be  compared  with  the 
ATNN.  A  brief  description  of  the  applications  used  to  train  and  test  the  networks  is 
followed  by  the  procedures  used  for  collecting  comparison  data.  Chapter  IV  contains  all 
the  comparison  results  of  the  applications  using  the  three  different  types  of  artificial  neural 
networks  as  well  as  a  disoission  of  the  prediction  capabilities  of  each. 
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IV.  RESULTS  AND  DISCUSSION 


Chapter  m  covered  the  development,  training,  and  testing  of  the  Adaptive  Time 
Delay  Neural  Network  (ATNN),  as  well  as  the  implementation  of  the  Time  Delay  Noiral 
Network  (TDNN)  and  subgrouped  Real-Time  Recurrent  Learning  (RTRL)  algorithms.  It 
discussed  a  means  for  comparing  these  algorithms  as  applied  to  the  problem  of  predicting 
nonlinear,  time  series  processes.  This  chapter  contains  the  results  achieved  for  each  of  the 
three  types  of  artificial  neural  networks  and  a  discussion  which  compares  their  prediction 
capabilities.  Two  specific  nonlinear  time  series  applications  were  studied.  The  first,  the 
sum  of  two  incommensurate  sine  waves,  tackles  a  relatively  easy  prediction  task  as  proof 
that  the  new  ATNN  algorithm  works  reasonably  well  as  a  prediction  networic.  The 
second  application,  predicting  the  time  series  fimction  related  to  a  set  of  financial  data, 
covers  a  much  more  difficult  prediction  task.  Given,  as  a  single  time  series  process,  only 
the  historical  data  associated  with  a  real  worid  problem,  predict  the  future  activity  of  the 
process.  There  is  more  embedded  information  in  such  a  process  than  even  the  human 
brain  can  accurately  predict  vdth  any  great  consistency.  The  artificial  neural  networks,  on 
the  other  hand,  are  capable  of  predicting  even  the  financial  time  series  data  to  some 
degree.  Thus,  if  the  result  can  be  obtained  in  time  to  be  useful,  the  human  user  of  the 
system  will  be  relieved  of  some  of  the  burden  involved  in  trying  to  make  sense  of  the  raw 
data  fi'om  inadequate  or  noisy  sensing  devices. 

The  following  results  show  that  each  network  is  able  to  predict  with  a  great  deal  of 
accuracy.  Instead  of  comparing  the  networks  on  the  basis  of  some  arbitrary  tolerance  for 
determining  percent  correct,  a  Mean  Square  Error  (MSE)  from  Equation  3.2  is  used;  it 
relates  the  target  output  and  the  predicted  output  directly.  To  set  the  stage  for 
comparison,  a  great  deal  of  trmning  and  testing  runs  of  each  network's  code  was 
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performed.  This  trial  and  error  method  was  not  to  optimize  the  configuration  of  each  type 
of  network  individually  but,  instead,  to  determine  the  best  configuration  which  could 
predict  reasonably  well  with  all  three.  The  result  provides  a  means  for  comparing  network 
performance  in  terms  of  training  time  efficiency  and  mean  square  error  (MSE)  of  predicted 
outputs. 

4. 1  Incommensurate  Sine  Wave  Results 

The  sum  of  incommensurate  sine  waves  exemplifies  a  function  which  is  just  on  the 
verge  of  non  predictability,  in  human  terms.  It  is  a  good  starting  point  from  which  to 
determine  the  capabilities  of  an  untested  predictor  network.  For  the  results  presented  here 
for  the  RTRL,  TDNN,  and  ATNN,  each  network  contained  one  input  node,  one  output 
node,  and  IS  hidden  nodes.  The  task  was  to  predict  only  to  one  time  step  {t+1)  in  the 
future.  The  RTRL  was  given  only  the  time  delayed  output  for  the  previous  input.  The 
TDNN  and  ATNN  inputs  were  buffered  to  allow  a  window  size  up  to  the  last  20  input 
time  samples  (max  =  20). 

4.1.1  RTRL  Results 

The  RTRL  prediction,  as  shown  in  Figures  10  through  12,  resulted  in  a  time 
averaged  MSE  below  0.001.  As  seen  in  Figure  10,  the  predicted  values  usually  overshoot 
or  undershoot  the  actual  desired  output  a  little  but  the  functional  form  is  mmntained 
almost  exactly  throughout  the  test  set.  For  this  more  simple  (low  order  dimensionality) 
application,  RTRL  provided  the  best  prediction  r^ults  with  the  given  network  parameters. 
The  RTRL  network  started  the  training  initially  with  a  very  low  MSE,  because  it  was 
designed  to  provide  feedback  and  the  best  possible  results  in  real-time.  Thus,  Figure  12 
shows  that  the  RTRL  has  a  shallow,  almost  linearly  decreasing  learning  curve. 
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Figure  10:  Incommensunite  Sine  Wave  Prediction  with  RTRL 
Test  data  (15  hidden  nodes) 


Figure  11:  Incommensurate  Sine  Wave  Prediction  with  RTRL 
Mean  Square  Error  (MSE)  for  test  data 


33 


Figure  12:  Incommensurate  Sine  Wave  Results  with  RTRL 

average  Mean  Square  Error  (MSE)  during  training 
(15  hidden  nodes) 


4.1.2  TDNN  RESULTS 

Predicting  the  incommensurate  sine  wave  function  vdth  fixed  time  delays,  using  the 
TDNN  special  case  of  the  code  developed  in  this  thesis,  proved  that  the  algorithm  is 
capable  of  learning  the  prediction  task  with  the  same  number  of  hidden  nodes  as  the 
RTRL.  With  no  feedback  mechanism,  the  TDNN  must  rely  on  the  past  inputs,  versus 
previous  outputs  derived  fi'om  learning  in  the  RTRL,  to  learn  the  appropriate  input-output 
relationships  of  a  function.  The  prediction  results  shown  in  Figure  13,  for  testing  after  the 
first  300  training  epochs,  were  the  best  obtained  for  TDNN.  More  training  did  not 
significantly  change  the  MSE  as  can  be  seen  in  Figure  14.  The  initial  learning  curve  for 
the  TDNN  was  quite  steep  and  then  remained  stable.  The  predicted  output  maintains  the 
general  form  of  the  desired  output  but  the  peaks  are  smoothed  instead  of  sharp  decision 
points.  Also,  the  predicted  output  appears  to  lead  the  desired  output  in  quite  a  few 
instances  thus  increasing  the  error. 
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Figure  13:  TDNN  Incommensurate  Sine  Wave  Prediction 
Test  Data  (15  hidden  nodes,  10  time  delays) 


Figure  14:  Incommensurate  Sine  Wave  Prediction  with  TDNN 
average  MSE  per  epoch  during  training 
(15  hidden  nodes,  10  time  delays) 
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4.1.3  ATNN  Results 


In  applying  the  ATNN  algorithm  to  the  incommensurate  sine  wave  prediction  task, 
it  was  hoped  that  a  better  time  dependency  relationship  could  be  learned.  Figure  1 5  shows 
that  the  ATNN  was  unable  to  significantly  improve  upon  the  learned  time  relationship 
found  with  the  TDNN.  There  is  an  improvemem,  however,  in  that  the  ATNN  relates  the 
peaks  better  than  the  TDNN  thus  not  smoothing  the  prediction  at  the  peaks.  The 
functional  form  of  the  prediction  compares  more  closely  to  the  desired  form,  but  the  time 
averaged  MSE  was  the  same  as  the  TDNN  for  this  best  case  ATNN  prediction  after  300 
epochs.  The  learning  curve  presented  in  Figure  16  shows  that  the  ATNN  learns  as  well 
as,  but  in  significantly  fewer  training  cycles  than,  the  TDNN.  The  instability  in  the 
learning  curve  is  due  to  a  high  learning  rate  set  for  the  time  delay  update  rule.  A  decaying 
time  delay  learning  rate  was  incorporated  manually  through  the  interactive  capability  of 
the  ATNN  code  to  overcome  this  instability.  Decaying  learning  rates  were  proven  by 
Lindsey  for  the  RTRL  and  were  verified  in  this  research  to  benefit  this  ATNN  algorithm  as 
well.  The  next  application  tasks  the  prediction  capability  of  these  algorithms  given  a  real 
world  problem  with  no  accurately  known  mathematical  model. 

4. 2  Daily  Financial  Data  Results 

Financial  time  series  data,  as  seen  earlier,  provides  an  interesting  example  of  a  real 
world  dynamic  process  about  which  very  little  is  known  as  far  as  rigorous  mathematical 
modeling  goes.  Historical  data  is  readily  available.  British  Pound  opening  price  data  was 
chosen  for  this  application.  This  example  has  proven  in  the  past  to  be  highly  unpredictable 
by  human  forecasters  tracking  raw  data.  It  is  hypothesized  that  a  neural  network  which 
incorporates  time  dependency  relationships  will  aid  the  human  in  assimilating  the 
information,  as  normally  provided,  to  better  predict  future  values. 
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Figure  16:  Incommensurate  Sine  Wave  Prediction  using  ATNN 
time  averaged  MSE  per  epoch  for  training  data 
(15  hiiden  nodes,  20  time  ddays) 
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Again  each  of  the  three  networks  was  trained  and  tested  as  discussed  in  Chapto-  m. 
In  aU  the  results  presrated  here,  it  was  found  that  the  best  configuration  for  con4)aring  the 
networics,  given  one  input  and  one  output,  would  contain  IS  hidden  iKxies.  The  TDNN 
and  ATNN  were  provided  with  a  maximum  of  20  time  delays.  The  case  of  providing  only 
10  time  delays  to  these  two  networks  is  also  presented  for  comparison.  The  ta^  was  still 
to  predict  only  to  one  time  step  (/+ 1)  in  the  future. 

4.2.1  RTRL  Results 

Applying  the  RTRL  algorithm  to  the  given  data  proved  unsuccessful  in  predicting 
the  desired  output  because  the  test  data,  although  normalized  on  the  entire  data  set, 
contained  very  few  points  that  related  directly  to  the  training  data  set  provided.  Better 
results  were  obtained  after  scaling  the  input  data  in  the  same  manner  as  the  ATNN  code 
performs  automatically.  This  extra  procedure  prevents  the  saturation  of  the  sigmoid 
function  so  that  when  the  test  data  lies  outride  the  norm,  usable  results  can  still  be 
obtained.  Results  of  the  RTRL  are  presented  in  Figure  17  through  19.  As  seen  in  Figure 
17,  the  RTRL  does  learn  the  financial  data  time  series  quite  well.  However,  Figure  18 
shows  that  as  this  network  predicts  on  points  fiuther  away  fi-om  the  actual  known  data, 
the  instantaneous  MSB  gradually  increases.  Since  the  RTRL  feeds  the  output,  with  its 
associated  error,  back  to  the  input  and  has  no  long  term  memory,  this  was  expected. 
Figure  19  shows  the  resulting  average  MSB,  or  learning  curve,  during  training.  As  seen, 
the  RTRL  learns  to  predict  with  a  time  averaged  MSB  to  about  0.001.  This  RTRL 
prediction  capability  will  be  compared  to  that  of  the  TDNN  and  ATNN  in  the  next 
subsections. 
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Figure  18:  Financial  Test  Data  Prediction  with  RTRL 

instantaneous  Mean  Square  Error  (MSE)  for  test  data 
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Figure  19:  Financial  Training  Data  using  RTRL 

average  Mean  Square  Error  (MSE)  per  epoch 

4.2.2  TDNN  Results 

Results  of  predicting  the  next  value  of  the  British  pound  data,  using  the  TDNN  with 
10  time  dela3rs,  are  shown  in  Figure  20.  Comparing  this  to  a  TDNN  with  20  time  delays, 
as  in  Figure  22,  shows  the  importance  of  adding  an  acceptable  amount  of  memory 
aq)ability.  The  20  TDNN  resuhs  follow  the  fiinctioiud  form  of  the  desired  output  much 
more  closely  than  with  just  10  time  delays.  These  prediction  results  are  given  for  the  test 
data  after  training  the  10  and  20  time  delay  netwoiics  for  1000  and  500  epochs, 
respectively.  These  were  the  best  test  results  obtained  over  the  whole  training  period. 
The  time  averaged  MSE  for  the  20  delay  case  proves  only  slightly  better  than  with  10 
delays,  as  shown  in  Figures  21  (for  10  time  delays)  and  Figure  23  (for  20  time  delays). 
This  is  true  because  the  10  delay  case  has  a  portion  of  the  results  right  through  part  of  the 
test  set,  but  even  there  it  does  not  follow  the  time  varying  nature  of  the  real  process.  With 
fewer  training  cycles,  the  20  delay  case  cleariy  attempts  to  follow  the  real  process  the  best. 
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Figure  21:  British  Pound  Data  Prediction  with  TDNN 
average  MSE  per  epoch  (15  hidden  nodes,  10  time  delays) 
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Figure  22:  British  Pound  Data  Prediction  with  TDNN 
Test  data  (15  hidden  nodes,  20  time  delays) 


Figure  23;  British  Pound  Data  Prediction  with  TDNN 
average  MSE  per  epoch  (15  hidden  nodes,  20  time  delays) 
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It  should  be  noted  also  that  the  first  2K  data  points  in  each  test  set  should  be  the  last  2Ar 
known  data  points  of  the  process  (where  K  is  the  maximuni  numbo-  of  time  delays  as 
configured  in  the  .DEF  file).  This  is  because  the  hidden  node  buffer  must  be  filled  before 
prediction  actually  begins,  thus  providing  a  longer  term  memory  for  TDNNs  than  RTRL. 

4.2.3  ATNN  Results 

^plying  the  ATNN  resulted  in  more  erratic  learning  when  high  fixed  learning  rates 
were  used  as  shown  in  Figure  2S,  but  the  predicted  values  match  the  functional  form  of 
the  test  data  set  even  better  than  the  best  TDNN  case  or  the  best  RTRL.  Although  the 
predicted  values  are  slightly  different  fi'om  the  desired  output,  they  follow  the  peaks  of  the 
real  process  with  considerably  more  detail  than  the  TDNN  (see  Figure  24  for  the  10  time 
delay  ATNN).  The  minimum  MSE  is  an  order  of  magnitude  better  than  either  the  RTRL 
or  the  TDNN  case.  Again,  as  with  the  TDNN,  the  first  values  of  the  test  set  must  be 
the  known  past  time  samples.  The  increased  instantaneous  MSE  in  Figure  26  is  the  result 
of  not  allowing  the  buffer  to  fill  properly.  The  predicted  values  for  this  network  do  follow 
the  desired  output  but  not  as  well  as  for  the  10  delay  case.  Figure  27  shows  the  results  of 
the  same  20  delay  network  (after  745  epochs)  when  the  buffers  are  allowed  to  fill  prior  to 
the  desired  prediction.  Better  instantaneous  MSE  is  obtained  for  the  actual  test  prediction 
points.  As  seen  in  Figure  28,  a  more  stable  averaged  MSE  results  during  training  for  this 
network.  The  learning  rates  were  held  at  0. 1  for  this  network  during  training.  Thus,  small 
learning  rates  keep  the  network  more  stable  but  require  more  training  epochs  to  obtain 
even  the  accuracy  seen  in  Figure  27.  Here  again,  it  is  seen  that  the  amount  of  memory 
required  to  learn  a  particular  process  is  important.  Too  much  memoiy  can  "confuse"  the 
network  requiring  extra  long  training  times.  The  ATNN  does  prove  capable  of  learning 
the  time  relationships  better,  as  in  the  10  delay  case,  by  an  order  of  magnitude  than  the 
other  two  networks  tested. 
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Figure  24:  British  Pound  Data  Prediction  using  ATNN 
Test  data  (15  hidden  nodes,  10  time  delays) 


Figure  25:  British  Pound  Data  Prediction  using  ATNN 
time  averaged  MSE  per  epoch  (15  hidden  nodes,  10  time  delays) 
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Figure  26:  British  Pound  Data  Prediction  using  ATNN 
Test  data  (15  hidden  nodes,  20  time  ddays) 


Figure  27:  British  Pound  Data  Prediction  using  ATNN 
Test  data  (15  hidden  nodes,  20  time  delays) 
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Figure  28:  British  Pound  Data  Prediction  using  ATNN 
time  averaged  MSE  per  epoch  (15  hidden  nodes,  20  time  delays) 

4.S  Discussion 

The  results  presented  in  the  previous  section  clearly  show  that  for  a  more  simple 

function,  the  subgrouped  RTRL  algorithm  performance  is  almost  unbeatable.  The  TDNN 

and  ATNN  predict  the  incommensurate  sine  wave  f^ly  well  compared  to  the  RTRL.  The 

MSE  for  the  RTRL  is  an  order  of  magnitude  better  on  the  incommensurate  sine  data  than 

that  of  time  delay  networks.  The  RTRL  algorithm,  even  this  10  times  faster  subgrouped 

RTRL,  takes  12  times  longer  per  training  cycle  than  the  ATNN  code.  Running  on  the 

lOOMhz  Silicon  Graphics  ONYX  systems,  the  RTRL  implementation  utilizes  only  a 

single  processor  per  neural  network.  The  object-oriented  programming  design  of  the 

ATNN  allows  the  work  to  spread  over  the  available  processors  (there  are  4  running  at 

lOOMhz  each  on  the  ONYX).  Once  trained,  though,  all  three  type  networks  predict  in 

roughly  the  same  amount  of  time  as  would  be  expected  since  the  number  of  hidden  nodes 

was  held  the  same  and  since  the  networks  are  no  longer  learning.  As  seen  in  Figures 

1  land  18,  the  instantaneous  error  begins  to  increase  slightly  as  the  RTRL  prediaion  gets 
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fiirther  from  the  known  values.  With  the  amount  of  time  it  takes  to  train,  this  RTRL 
algonthm  may  be  impractical  for  some  real  world  problems.  Retraining  is,  probably, 
necessary  quite  often  to  accurately  predict  highly  dynamic  processes.  The  RTRL  may 
have  to  be  retrained  even  more  often  than  feedforward  type  networks  since  error,  which 
feeds  back  to  the  input  in  the  RTRL,  may  eventually  degrade  its  abilities.  The  time  delay 
type  networks  may  be  better  for  predicting  real  world  processes,  which  are  often  more 
nonlinear  and  more  complex,  thus  requiring  some  amount  of  long  term  memory 
incorporated  in  the  learning  process. 

The  results,  for  an  example  of  this  real  world  type  process,  show  that  the  ATNN 
code  developed  here  will  be  more  adaptable  to  the  inputs  especially  if  their  values  are  not 
known  to  directly  relate  to  the  trained  values  before  testing.  Once  the  input  data  from 
past  known  events  are  buffered  the  ATNN  learns  to  predict  the  direction  of  change  for  the 
future  values  of  a  real  world  process  exceptionally  weU  even  for  test  data  that  is  outside 
the  norm  of  the  training  set.  Therefore,  for  predicting  the  direction  of  future  values  of  a 
highly  nonlinear  process,  the  ATNN  wins,  by  an  order  of  magnitude  in  MSE,  over  the 
RTRLorTDNN. 

4.4  Summary 

This  chapter  (hscusses  the  results  of  research  performed  using  the  ATNN  code 
developed  in  this  thesis.  By  comparing  the  ATNN  and  TDNN,  the  importance  of  learning 
the  best  time  delay  values  became  apparent.  By  varying  the  fixed  time  delays  in  the 
TDNN  as  well  as  letting  the  ATNN  do  that  work,  better  prediction  resulted.  Two  specific 
nonlinear  time  series  applications  were  studied.  The  RTRL  beat  the  ATNN  and  TDNN 
for  the  more  simple  nonlinear  function  in  terms  of  absolute  accuracy.  However,  The 
ATNN  learned  quickly  and  provided  very  accurate  prediction  of  the  process  direction. 
The  ATNN  outperformed  the  others  by  far  when  given  a  complex,  real  world  processes. 
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After  presenting  the  results,  a  brief  discussion  is  included  on  the  computation  times 
of  the  computer  codes  used  (i.e.,  ATNN  and  RECNET).  The  algorithm  for  ATNN 
inherently  takes  less  time  than  RECNET.  RECNET  also  suffers  in  that  it  only  runs  on  a 
single  processor  whereas  ATNN  makes  use  of  the  multi-processor  environment  when 
available.  Conclusions  and  recommendations  for  future  work  in  this  area  are  given  in  the 
n^  chapter. 
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V.  CONCLUSIONS  AND  RECOMMENDATIONS 


This  thesis  implements  an  Adaptive  Time  Delay  Neural  Network  (ATNN)  capable 
of  user-defined  degeneration  to  the  more  common  Time  Delay  Neural  Network  (TDNN) 
or  Error-Backpropagation  Network  (BP).  The  algorithm  test  results  and  time  series 
function  prediction  capabilities  as  compared  to  the  RTRL  algorithm  show  the  advantages 
and  disadvantages  of  ATNNs  for  prediction.  Time  series  prediction,  defined  as 
determining  future  value  of  a  process  based  on  historical  data,  applies  to  a  great  number  of 
real  world  situations.  Many  of  these  applications  are,  also,  extremely  important  to  the  Air 
Force. 

5.1  Conclusions 

Lindsey  demonstrated  the  RTRL  algorithm's  ability  to  learn  several  time  dependent 
functions.  He  showed  that  the  RTRL  was  more  robust  than  the  best  linear  predictor.  So 
this  thesis  compares  an  RTRL  algorithm  to  the  prediction  ability  of  a  new  algorithm, 
ATNN,  which  learns  the  optimum  time  delays  on  the  input  facts.  This  new  approach  to 
solving  the  time  series  function  prediction  task  proves  useful  in  both  determining  the 
direction  of  future  values  and  taking  less  computation  time  during  training.  Although  the 
RTRL  clearly  beat  the  ATNN  at  predicting  for  a  known  predictable  function,  the  ATNN 
still  performed  to  within  10  percent  of  the  almost  exact  RTRL  capability.  The  TDNN 
prediction  capability  came  in  last  for  this  case  because  a  smoothing  effect  at  the 
nonlinearities.  It  was  the  ATNN  that  bested  both  the  TDNN  and  RTRL  when  applied  to 
a  real  world  process  with  no  known  mathematical  function.  The  RTRL  had  problems 
handling  different  inputs  than  those  of  its  training  set  due  to  a  scaling  problem.  The 
TDNN  again  learned  the  overall  trend  in  the  data  set  but  tended  to  smooth  over  the 
nonlinearities.  Thus  for  a  highly  nonlinear  process  the  ATNN  outperformed  the  others  by 
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&r.  This  comparison  testing  demonstrates  that  the  ATNN  should  be  very  good  at 
predicting  any  process,  but  if  the  process  has  a  high  degree  of  randomness,  the  ATNN  can 
pick  out  and  predict  even  the  fine  sample  to  sample  relationships.  The  other  netwoilcs 
tested  here  could  not. 

Every  algorithm  studied  as  part  of  this  research  effort  incorporates  its  own  strengths 
and  limitations.  What  works  best  will  depend  on  the  problem  the  network  attempts  to 
solve.  There  exist  many  ways  to  tailor  a  network.  Fit  between  the  problem  and  solution 
governs  the  choices.  It  seems  evolution  may  try  various  modifications  in  network  design, 
use  them  for  a  while,  then  keep  those  that  solve  the  problems  the  best.  The  central 
nervous  system  does  not  conform  to  a  single  network  layout  but  uses  the  right  design  in 
the  right  places.  The  best  network  design  depends  on  the  task  and  how  the  evolutionary 
choosing  process  shook  out.  For  engineers  trying  to  match  artificial  neural  networks  to 
problems  and  solutions,  the  most  effective  approach  might  be  to  develop  a  toolbox 
contmiung  various  network  layouts,  apply  those  that  make  sense,  and  keep  only  the  most 
efficient  network  for  the  given  problem.  This  thesis  steps  toward  the  process  of  building 
an  effective  toolbox  for  solving  real  world  problems. 


5.2  Recommendations 

Real  world  inputs  generally  have  more  dimensions  than  those  chosen  in  a  laboratory. 
This  scaling  problem  concerns  whether  the  scaled  network  can  incorporate  all  the  relevant 
dimensions  and  still  perform  the  prediction  task  in  real  time.  Also,  how  are  the  correct 
inputs  to  be  add  to  the  relevant  dimension  of  a  complex  process  determined?  The  ATNN 
code  developed  for  this  thesis  allows  the  user  to  provide  a  large  number  of  input  features 
than  the  typical  TDNN  input  scheme  for  trying  to  predict  the  future  values  in  a  given 
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process.  Therefore  future  work  could  incorporate  this  ability  by  providing  the  networic 
with  other  processes  that  seem  to  effect  the  process  under  study. 

Many  processes  associated  with  the  real  world  are  highly  nonlinear  and  seemingly 
unpredictable.  These  are  very  interesting  to  study  because  these  "random"  processes 
usually  are  not  completely  random.  Potential  exists  for  moving  them  from  the  realm  of 
unpredictable  to  some  degree  of  predictable  u^g  artifidal  noiral  network  technology. 
One  very  interesting  problem  for  the  Air  Force  is  predicting  pilot  head  motion,  which  is 
often  a  highly  nonlinear  process.  Prediction  seems  possible  given  the  pilot's  situation  at  a 
particular  instant  in  time  is  known.  This  prediction  could  be  used  to  update  virtual  cockpit 
displays  faster.  By  predicting  where  the  head  will  be  just  a  tenth  of  a  second  ahead,  it  is 
possible  to  make  the  computer  generated  scene  on  a  helmet  mounted  display  seem  more 
real,  or  virtual.  This  thesis  provides  a  comparison  of  one  method,  ATNN,  to  another, 
RTRL,  which  might  be  used  to  perform  this  prediction  task.  Much  more  research  is 
needed  in  the  area  of  neural  networks,  and  research  using  them  for  predicting  time  series 
processes  has  really  just  begun. 
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APPENDIX  A.  Time  Delay  Update  Rule 
This  appendix  presents  the  derivation  of  the  tinw  delay  update  ride  found  in  the 
article  by  Lin  [9]  and  used  in  this  thesis. 

Define  an  instantaneous  error  measure  (MSE)  as  in  Equation  3.2. 

jiP 

where  P  is  the  number  of  output  nodes  with  computed  values,  The  term  dj(t„) 

denotes  the  desired  output  of  node  j  at  time  t„  .  Then  by  the  chain  rule 


dE{0 

dtj,^  dSj  dx^ 


(Al) 


(A2) 


The  second  term  in  Equation  A 1  is  ^ven  by 

'ir~ "  ) 

=  ”  V ) 

where  N  is  the  number  of  nodes  in  the  previous  layer  and  K  is  the  maximum  number  of 
time  delays.  Now  define 

P,(<.)  =  ^|P  (A.3) 

aoj 

Substitute  Equation  (A.2)  and  (A3)  into  Equation  (A.1),  to  obtain 


=  -Pj  0.  -  V  ) 

^jtic 


Thus  the  learning  rule  as  in  Equation  (3.8)  is  obtained 


(A.4) 
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(A.5) 


To  derive  Pj  (/. ) ,  apply  the  chain  nile  and  consider  two  cases; 


Py(^)  = 


dEiO 

as. 


dEiOaajiO 

doj  aSj 

=^^r(s,o.)) 


To  find  \  consider  two  cases: 

da  ■ 

j 

1.  If  j  is  an  output  node; 


(A.6) 


(A.7) 


and  thus  Equation  (A.6)  becomes 


Pj  if. )  =  -idj  it.  )  -  fly  it.  WiSj  it. )) 


(A.8) 


2.  If  j  is  a  hidden  node; 


3E((.)  ySE(l,)SS,(.l,) 

3“,  ja  3s,  Bdj 

f  eP  y  J 


(A.9) 


JI6/’  9=1 
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v^ere  P  is  the  number  of  nodes  in  the  next  l&yer,  ^  is  the  numbo*  of  nodes  in  the  previous 
layer,  and  K  is  the  maximum  number  of  time  dealys.  For  this  case  Equation  (A.6) 
becomes 

(A.10) 
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APPENDIX  B.  Using  the  ATNN  Code 


RlAn  Interactive  Environment 

A  main  program,  named  testatnn,  controls  the  operation  of  the  ATNN  algorithm.  It 

provides  an  interactive  environment  for  training  (the  defiuilt  option),  testing,  and  running 

the  ATNN  using  the  "-L",  "-T*.  "-R"  options,  respectively.  The  atnn  command,  if  given 

alone,  defaults  to  a  training  example  of  the  network  named  ATNN  which  requires  a 

definition  file  (ATNN.DEF)  and  a  fiwt  file  (ATNN.FCT).  The  "-n  netname”  option,  if 

added  to  the  command  line  before  the  other  options,  allows  the  user  to  specify  a  particular 

# 

network  ruune  using  the  structure  (in  netname.DEF  file)  for  an  associated  fact  file 
(netname.FCT)  to  begin  training.  Another  command  line  option  (-v)  initiates  a  verbose 
mode  for  tracing  the  network  through  the  training  or  testing  phase  then  saving  the  weights 
and  time  delays  into  a  readable  ASCII  file  (located  in  a  netiuune.MAT  file).  As  an 
example,  the  command 

atnn  >n  mytestfUe  -T  -v 

will  execute  the  test  part  of  the  ATNN  algorithm,  with  verbose  output  to  the  monitor, 
using  "mytestfile.TST"  as  the  input  fiict  file. 

The  program  is  interactive  in  that  it  allows  the  user  to  set  a  maximum  number  of 
epochs  or  target  minus  network  output  tolerance  for  determining  program  completion,  but 
the  training  process  may  be  suspended  (by  pressing  the  ESC  key  on  the  Silicon  Graphics 
or  any  key  on  a  PC)  and  saved  at  any  point.  Subsequent  training  automatically  resumes 
fi'om  the  stored  network  information  (located  in  the  associated  .WTS  file).  This  allows  for 
testing  at  different  points  during  the  training  process  which  helps  in  optimizing  training. 

Blum  uses  this  same  file  extension  naming  convention  throughout  his  neural 
network  tool  kit.  Implementations  require  basically  the  same  set  of  files;  files  for 
network  definition  (.DEF  files),  training  data  (.FCT  files),  test  data  (.TST  files),  and  run 
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data  (.IN  files).  Outputs  save  to  a  common  set  of  files;  binary  learned  network 
information  (.WTS  files),  ASCII  learned  network  information  (.MAT  files)  ^\^ien  -v 
invoked,  and  run  results  (.OUT  files).  Two  additional  files,  implemented  for  this  thesis, 
save  intermediate  error  information  for  later  analysis  and  comparison;  training  error 
(.ERR  files)  and  test  error  (.TER  files). 

B.2  Training  with  the  ATNN 

Given  the  required  files,  atnn  configures,  dynamically,  to  the  defined  network 
parameters.  Memory  gets  allocated  for  all  the  variables  during  initialization.  The  program 
reads  each  vector  pair,  one  at  a  time,  to  the  end  of  the  file.  Thus,  any  number  of  patterns 
may  be  presented  to  the  network  fi-om  the  fact  file.  The  network  begins  learning  to 
optimize  the  weights  and  time  delays,  until  either  all  the  facts  are  within  tolerance,  the 
maximum  number  of  iterations,  or  cycles,  is  reached,  or  training  is  suspended. 

During  training,  ATNN  outputs  information  to  the  monitor  for  determining  training 
status.  In  the  de&ult  mode,  the  cycle  (or  epoch)  number  displays  followed  by  an  "x", 
indicating  a  bad  guess  (outside  tolerance),  or  a  indicating  a  good  guess  (all  outputs 
within  tolerance).  In  the  verbose  mode,  several  messages  display  information  to  trace  the 
outputs  through  the  learning  phase;  Unsquashed  guess  -  the  linear  output  before  applying 
the  sigmoid,  the  Output  layer  threshold  -  used  in  the  sigmoid  function.  Desired  outputs, 
and  network  Guessed  outputs.  Also,  the  words  Bad  guess  or  Good  guess  replace  each 
"x"  or  "."as  appropriate.  At  the  end  of  each  cycle,  the  average  MSE  and  percentage 
correct  display. 

The  fact  file  (with  a  .FCT  extension)  contains  all  the  input-output  pattern  pairs, 

called  vector  pairs,  for  training.  In  each  of  these  files,  the  first  line  defines  the  vector  of 

minimums  for  the  input  features,  followed  by  a  comma,  followed  by  the  vector  of 

minimums  for  all  the  output  factors  (thus  the  name  vector  pair).  The  second  line  contains 
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the  maximiiins  for  the  inputs  and  outputs  in  the  same  manner.  The  third  line  is  a  comment 
line  which  begins  with  a  colon  (" :  ”).  The  subsequent  lines  each  contain  the  actual  vector 
pairs.  The  maximums  and  minimums  scale  the  values  of  each  piece  of  data  to  a  number 
between  0  and  1.  Values  below  the  specified  minimum  or  above  the  maximum  for  each 
fiu:t  get  converted  to  0  or  1,  respectively.  Upon  training  completion,  or  suspension,  the 
learned  parameters  get  saved  to  a  binary  file  (netname.WTS  file)  and  computed  error  and 
accuracy,  for  each  epoch,  go  to  another  file  (tMtname.ERR).  The  resulting  network  may 
then  be  tested  or  run. 


B.3.  Te^ng  and  Running  the  ATNN 

When  invoked  with  the  atnn  -T  conunand  option,  testatnn  tests  the  trained  network 
on  the  facts  stored  in  the  test  file  (.TST  file).  The  test  ^  file,  formatted  the  same  as  the 
training  fact  file,  contains  vector  pairs  for  testing.  An  output  file  (with  a  .TER  extension) 
saves  the  desired  output,  network  output,  and  instantaneous  Mean  Square  Error  (MSE) 
for  each  input  vector  as  well  as  a  time  averaged  MSE  for  the  entire  test  data  set.  To  run 
the  trained  network,  giving  it  only  the  input  vectors,  invoke  the  atnn  -R  command  option. 
This  requires  an  input  file  (\Adth  a  .IN  extension)  in  much  the  same  format  of  minimum 
line,  maximum  line,  comment,  and  input  features.  However,  no  desired  output  vector  is 
included  in  the  input  file.  The  program  creates  a  file  of  outputs  (with  a  .OUT  extension) 
corresponding  to  the  input  features. 
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APPENDIX  C.  Adaptive  Time  Delay  Neural  Network  Source  Code 


This  appendix  contiuns  the  source  code  listings  for  the  Adaptive  Time  Delay  Neural 
Network  algorithm  developed  at  AFTT  called  ATNN.  It  was  written  in  C-h-  object- 
oriented  programming  style  and  successfully  compiles  on  the  Silicon  Graphics 
workstations  as  well  as  on  an  IBM/compatible  486  personal  computer  using  Turbo  C+  + 
3.0.  The  main  shell  program  is  named  tcstatnii.cc. 
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1 


//////////////////////// 

//  TESTATNN.CC 

//  Interactive  ATNN  System  Demonstration  Program 
//  Used  to  verify  ATNN  system  algorithms 
//  Devdoped  with  Turbo  C++  3.0 

//  Author;  Capt  James  Gainey,  GEO-93D  Last  Modified;  10  Sep  93 


define  NDEBUG  1  //  ANSI  method  to  arable  or  disable  debugging 
#include''atnn.hpp” 

#include  <getopt.h> 


char  netname[16]=''ATNN";  //  file  where  test  data  is  stored 

char  mode=0;  //default  to  leam  or  training  mode 

int  trace=0;  //  SET  TRACE=<whatever>  at  DOS  prompt  to  turn  trace  on 

char  *p; 

main(int  argc,char  **argv) 

{ 


#ifdef_ZTC_ 

//  shouldn't  have  to  do  this!  these  should  be  defaults 
cout.setf(cout.unitbuO;  //  turn  "unitbuT  on  to  force  flushing  after  each  char 
cout.unsetf(cout.scientific);  //  turn  skipws  and  scientific  off 
#endif 

cout.precision(2); 

cout « "TEST_ATNN  -  Interactive  Adaptive  Time-Delay  Network  TesterNn"; 
int  option; 

while  ( (  option  =  getopt(  argc,  argv,  "n;LTRv''  ))!=-!) 

{ 

switch  ( option  ) 

{ 

case  'n'; 

strcpy(  netname,  optarg  ); 
break; 
case'L'; 

mode  =  'L'; 
break; 
case'T; 
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mode  =  T; 
break; 
case  11'; 

mode  =  H'; 
break; 

case  V: 

trace  =  TRUE; 
break; 

default; 

break; 

} 

} 

atnn  b(netname); 

switch(mode){ 
case 'T'; 

b.testO; 

break; 
case  H’; 
b.runO; 
break; 
default; 
case  "L'; 
b.trainO; 
break; 
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} 


return  1; 

} 
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//////////////////////////////////^^^^^^ 

//ATNN.CC 

//  Implementation  of  an  Adaptive  Time-Delay  Neural  Net 
//  Developed  with  Turbo  C-h-  3.0 

//Author;  Capt  James  Gainey,  GEO-93D  Last  Modified;  10  Oct  93 


#mclude  "atnn.hpp" 
extern  int  trace; 


atnn;;attm(char  *s);net(s)  //  constructor 

{ 

const  NOP  ARMS  =  9; 

FARM  parms[NOPARMS]; 

strcpy(parms[0].name,  "HIDDEN''); 
parms[0].type  =  integer; 

strcpy(parms[l].name,  "MOMENTUM"); 
parms[l].type  =  real; 

strcpy(parms[2].name,  "INITRANGE"); 
parms[2].type  =  real; 

strcpy(parms[3].name,  "MAXNUMTAU"); 
parms[3].type  =  integer; 

strcpy(parms[4].name,  "EPOCH"); 
parms[4].type  =  integer; 

strcpy(parms[5].name,  "TOLERANCE"); 
parms[5].type  =  real; 

strcpy(parms[6].name,  "RATE2"); 
parms[6].type  =  real; 

strcpy(parms[7].name,  "TSTEP"); 
parms[7].type  =  real; 

strcpy(parms[8].name,  "TDNN"); 
parms[8].type  =  integer; 
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readparms(NOPARMS,parms,namc); 
q  =  panns[0].val.i; 

momentum  =parms[l].val.f; 
initnmge  =  parms[2].val.f; 

K  *  parms[3].val.i; 

epoch  =  parms[4].val.i; 
tolerance  =  parms[5].val.f; 
leamrate2  =  panns[6].val.f; 
tstep  =  parms[7].val.f; 

TDNN  =  parms(8].val.i; 

//  initialize  weight  matrices  to  random  values  from  -1  to  +1 
//  and  time  delay  matrices  to  integer  values  from  0  to  K 

W1  =new  mtrx3d(q,n,K,-i™fr*”8®)i 
W2  =new  mtrx3d(p,q,K,-initrange); 

Taul=new  mtrx3d(q,n,K,K); 

Tau2=new  mtrx3d(p,q,K,K); 

dWl  =new  mtrx3d(q,n,K); 
dW2  =new  mtrx3d(p,q,K); 
dTaul=new  mti:x3d(q,n,K); 
dTau2*new  mtrx3d(p,q,K); 

a_buf  =new  matrix(n,K); 
hjbuf  =new  matrix(q,K); 
a_bufj  =new  matrix(n,K); 
hjbufj)  =new  matrix(q,K); 

h=new  vec(q); 
o=new  vec(p); 
d=new  vec(p); 
e=new  vec(q); 

thresh  l=new  vec(q); 
threshl  ->randomize(initrange); 
thresh2=new  vec(p); 
thresh2->randomize(initrange); 

if(epoch)( 
totd=new  vec(p); 
tote=new  vec(q); 

} 


minvecs=new  vecpair(n,p); 
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inaxvecs=new  veq)air(n,p); 


cycleno=0; 

} 

atnn::~atnnO 

{ 

delete  Wl; 
delete  W2; 
delete  Taul; 
delete  Tau2; 

delete  dWl; 
delete  dW2; 
delete  dTaul; 
delete  dTau2; 

delete  h; 
delete  o; 
delete  d; 
delete  e; 

if(epoch){ 
delete  totd; 
delete  tote; 

} 

delete  tninvecs; 
delete  maxvecs; 


) 

itiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiim^^^^ 

// 

//  ADAPTIVE  TIME  DELAY  ALGORITHM  METHODS  -  ENCODE  AND 

//RECALL 

// 

int  atnn;:encode(  vecpair  &v ) 

{ 

float  maxdiff; 

//  Step  1):  Propagate  through  to  hidden  layer  nodes 
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//  Buffer  inputs  to  ajbuf[i][tn]  for  the  last  K  timesteps 
buffeiX  a_buf  v.a  ); 


//  Sum  the  K  buffered  time-delay  connections.  Each  connection 
//  is  the  product  of  a  time  delayed  input  and  the  weight 
//  matrix.  Get  tdblocks  for  each  input. 

propagate(*h,  *Wl,*Taul,*a_buO; 


//  Sum  all  the  tdblocks  into  a  node  then  apply  sigmoid  function 
h->sigmoid(*thresh  1 ); 

//  Step  2);  Propagate  through  to  the  output  nodes 

//  Buffer  the  hidden  layer  activations  to 
//  h_bu£[i][tn]  for  the  last  K  timesteps 

buffer(  hjbuf,  h ); 


//  Sum  the  K  buffered  time-delay  connections.  Each 
//  connection  is  the  product  of  a  time-delayed  input 
//  to  the  hidden  node  and  the  weight  matrix.  Get 
//  tdblocks  for  each  hidden  node. 


propagate(*o  ,*W2,*Tau2,*h_buO; 


//  Sum  all  the  tdblocks  then  apply  sigmoid  function 
if(trace){ 

cerr  « "\nUnsquashed  guess: " «  *o 
« "\nOutput  layer  threshold  " «  *thresh2; 

} 

o->sigmoid(*thresh2); 
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//  Step  3);  Conqnite  the  error  for  the  output  layer 
if  (epoch){  //  adjust  weights  at  the  end  of  the  cycle 


//d  =  o(l-o)(o-t) 

//  the  somewhat  circuitous  code  is  so  that  we  can  use  existing 
//  overloaded  operators  from  the  vector  class 
*d  =  (*(v.b)-*o); 


if(trace){ 

cerr.precision(6); 

cerr  « "NnDesired  Output: " «  *(v.b)  «  "  Guess; " «  *o ; 

} 

maxdiflNl->maxvalO; 

*d  -*d*  o->d_lo®sticO;  //dJopsticQ  returns  v(l-v) 


//  Step  4):  Compute  the  error  for  hidden  nodes 
backprop(^e,*d,*W2); 

♦e  =  *e  •  h->d_logisticO;  // returns  dot  product  of  vec  &  complement 


//  weights  will  be  adjusted  at  end  of  cycle  with  following  totals 
*totd  +=  *d; 

*tote  •^=  *e; 


} 

else{  //  pattem-by-pattem  training 


//  Step  3);  Compute  the  error  for  the  output  layer 
//  d  =  o  (l-o)  (o-t) 

//  the  somewhat  circuitous  code  is  so  that  we  can  use  existing 
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//  overioaded  operators  from  the  vector  class 
*d  =  (*(v.b)-*o); 
MSE+=(0.5*(*d**d)); 


if(trace){ 

cerr.preci»on(6); 

cerr  « "\nDesired  Output; "  «  *(v.b)  « "  Guess: " «  *0  ; 

) 

maxdifF=d->maxvalO; 

•d  =*d*  o->djogistic0;  //dJogisticOretums  v(l-v) 


//  Step  4);  Compute  the  error  for  hidden  nodes 


backprop(*e,*d,*W2); 

*e-*e  *  h->djogistic0;  H  returns  dot  product  of  vec  &  complement 


//  Step  5);  Update  weights  and  time-delays  for  the  hidden  to  output  layer 

//  W2  =  W2  +  a  h  d  +  (momentum*dW2(t-l)) 
initdWts(*dW2,*h_buf;*d,leamrate,momentum);  // "  a  h  d  +  momentum"  part 


(*W2)  +=  •dW2; 
if(TDNN=0){ 

//  Tau2  =  Tau2  +  (rate  *  h’  W2  d) 

deriv(*h_buf_p,*h_buf;'*Tau2,tstep); 

initdTau(*dTau2,*W2,*h_buf_p,*d,leamrate2); 
(*Tau2)+=  *dTau2; 

} 

*thresh2  +=  ( (*d)  *  leamrate  ); 
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//  Stq)  6):  Update  weights  and  time-delays  for  the  inimt  to  hidd«i  layer 

//  W1  *=  W1  +  a  i  e  +  (inomentum*dWl(t-l)) 
initdWts(*dW  1 ,  *a_buf,  *e,leamrate,momentum); 

(*Wl)+=  ‘dWl; 

if(TDNN=*0){ 

//  Taul  =  Taul  +  (rate  *  i'  W1  e) 

deriv(*a_buf_p,*a_buC*Taul,tstep); 

initdTau(*dTaul,*Wl,*ajMif_p,*e,leamrate2); 

(•Taul)+=  •dTaul; 

} 

•thresh  1  +=  (  (*e)  •  leamrate); 


}  //  end  pattem-by-pattem  training 


//  Step  7):  Compare  max  difference  to  tolerance 
if(trace) 

cerr  « "\nMaximum  difference; " «  maxdiff; 
if(maxdiff  <  tolerance )  {  return  1; } 
else  { 
return  0; 

} 

}  //end  encodeO 


void  atnn;;deriv(matrix&  m2,matrix&  ml,mtrx3d&  m3dl,const  float  tstep) 
//Compute  the  derivative  of  node  input  for  time-delay  learning  rule  updates 
{ 
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int  j=0;  //  time-delay  value  is  the  same  for  all  output  nodes 
//  associate!  with  each  input 
for(int  i=0;i<m3dl.width0;i++) 
for(int  k=0;k<  K;k-H-){ 
int  tau  =  int  (m3dl.getvai(j,i,k)  +  0.5); 
if(tau==0) 

m2.setv^(i,k,(((ml.getval(i,tau))  -  (ml.getval(i,tau+l)))/tstep)); 
else 

m2.setval(i,k,(((ml.getval(i,tau-l))  -  (ml.getval(i,tau+l)))/(2*tstep))); 

) 

} 


void  atnn::bufrer(  matrix  *ml,  vec  •vl  ) 

// 

{ 

for(int  i=0;  i<ml->depthO;  i++  ) 

{ 

for(int  tn=K-l;tn>=0;tn— ) 

{ 

m  1  ->setval(i,tn+ 1  ,(m  1  ->getval(i,tn))); 

} 

ml->setval(i,0,(vl->v[i])); 

} 

} 


vec  atnn::propagate(vec&  vl,  mtrx3d&  m3dl,mtrx3d&  m3d2,matrix&  ml) 

//  Double  sum  of  product  (weights  *  node  inputs)  for  each  time-delay  on  each  input 

{ 

for(int  j=0J<m3dl  depthQ j++) 

for(int  i=O;i<m3dl.width0;i++) 
for(int  k=0;k<K;k-H-) 

{ 

if  ( m3d2.getval(j,i,k)  <  0.0  )  {m3d2.setval(j,i,k,0.0);} 

int  tau  =  int  (m3d2.getval(j,i,k)  +  0.5); 

vl.vjj]  +=  (m3dl.getval(j,i,k))  •  (ml.getval(i,tau)); 

} 

return  vl; 

} 


vec  atnn::backprop(vec&  vl,vec&  v2,mtrx3d&  m3dl) 
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//  Double  sum  for  computing  back-propagation  error  to  the  hidden  nodes 

{ 

for(int  j=0  j<m3dl  .depthOj++) 

for(int  m=0;m<m3dl.widthO;nH-+) 
for(int  k=0;k<K;k-H-) 

vl.v[m]  +=  (v2.v|j])  *  (m3dl.getval(j,m,k)); 

return  vl; 

} 


void  atnn;;initdWts(mtrx3d&  m3dl,matrix&  ml.const  vec&  vl, 
const  float  rate,const  float  momentum) 

//  Used  to  initialize  a  3d  matrix  to  the  element  by  element  product 
//  of  vl  and  ml  times  the  learn  rate 
//  also  adding  in  the  previous  contents  of  the  3d  matrix 
//  multiplied  by  a  momentum  term. 

{ 

for(int  i=0;i<m3dl.depth0;i++) 
for(int  j=0  j<m3dl  .widthO 
for(int  k=0;k<K;k++) 

m3dl  .setval(i  j,k,(((m3dl  .getval(ij,k))*momentum)+ 
((vl  .v[i])*(ml  .getval(j,k))*rate))); 

} 


void  atnn::initdTau(mtrx3d&  m3dl,mtrx3d&  m3d2,matrix&  ml.const  vec&  vl, 
const  float  rate) 

//  Used  to  initialize  a  3d  matrix  to  the  element  by  element  product 
//  of  vl,  m3d2,  ml,  and  the  learn  rate.  No  momentum  term. 


{ 

for(int  i=0;i<m3d2.depth0;i++) 

for(int  j=0  J<m3d2.width0  j++) 
for(int  k=0;k<K;k-H-) 

m3dl  .setval(i  j,k,((-l)*(vl  .v[i])*(m3d2.getval(i  j,k)) 
*(m  1  .getval(j,k))*rate)); 


} 
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vec  atnn:;recall(  vec  &v ) 

{ 

//Step  l):h  =  F(Wl  i) 

//  Buffer  inputs  to  ajbuf[i][tn]  for  the  last  K  timesteps 


buffer(  a_buf,  &v ); 


//  Sum  the  K  buffered  time-delay  connections.  Each  connection 
//  is  the  product  of  a  time  delayed  input  and  the  weight 
//  matrix.  Get  tdblocks  for  each  input. 

propagate(*h,*Wl,*Taul,*a_buf); 


h->sigmoid(*thresh  1 ); 


//Step2).o  =  F(W2h) 
vec  out(this->p); 

//  Buffer  the  hidden  layer  activations  to 
//  h_buf[i][tn]  for  the  last  K  timesteps 

buffer(  h_buf,  h  ); 


//  Sum  the  K  buffered  time-delay  connections.  Each 
//  connection  is  the  product  of  a  time-delayed  input 
//  to  the  hidden  node  and  the  weight  matrix.  Get 
//  tdblocks  for  each  hidden  node. 

// 

propagate(out,*W2,*Tau2,*h_buf); 


if(trace)( 

cerr  « "\nUnsquashed  guess: " «  out 
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«  "VnOutput  layer  threshold  " «  •thresh2; 

} 

out.siginoid(*thresh2); 
return  out; 

} 

// 

//  This  will  get  called  from  the  neural  network  train  since  train 
//  will  call  the  most  derived  cycle  method. 

//  We  need  to  override  the  network  cycle  since  the  time  delay  algorithm 
//  may  require  the  weights  to  be  update  at  the  end  of  a  cycle. 

float  atnn:;cycle(istream&  s) 

{ 

vecpair  v(n,p); 
int  good  =  0; 
int  total  =  0; 


s  »  *minvecs; 


s  »  *maxvecs; 


if(epoch)  //  initialize  error  accumulation  vectors 

{ 

for(int  i=0;i<totd->lengthO;i-H-) 
totd->set(i); 

for(i=0;i<tote->lengthO;i'H-) 

tote->set(i); 

} 

skipcmt(s); 


int  okay  =  TRUE; 
int  td  =  total; 


while  ( td  <  K )  //  Buffer  inputs  to  a_buf[i][tn]  for  K  timesteps 

{ 

s»  v; 
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if(s.eofOlls.feUO) 

olwy  =  FALSE; 

if  (  okay ) 

{ 


v.scale(*ininvecs,*inaxvecs); 
bufFer(  a_buf,  v.a );  td-H-; 

} 

) 


while  ( okay ) 

{ 

s»  v; 


if(s.eofOI|s.failO) 

okay  =  FALSE; 

if  (  okay ) 

{ 


V.  $cale(*minvecs,  *inaxvecs); 


if(encode(v)) 

{ 

gOOd-H-; 

if(!trace) 

cerr  « 
else 

cerr  «  "\nGood  guess" 

} 

else 

{ 

if(!trace) 

cerr«  V; 
else 

cerr  « "\nBad  guess"; 

} 
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} 


tOtal-H-; 


#ifdef_TURBOC_ 
if(kbhitO){  return  1.0;} 

#else 

if(getbutton(ESCKEY)) 

( 

return  1.0; 

} 

#endif 

) 

if(epoch){  //  adjust  weights  at  end  of  cycle 
//  W2  =  W2  +  a  h  d  (total) 

initdWts(*dW2,*h_buf,*totd,Ieamrate,momentum);  // "  a  h  d  "  part 
(*W2)  +=  *dW2; 


if(TDNN==0){ 

//rau2  =  Tau2  +  (rate  *  h'  W2  d(total)) 

deriv(*h_buf_p,  *h_buf,  *T  au2,tstep); 

initdT  au(*dT  au2,  *  W2,  *h_buf_p,  *totd,leamrate); 

(*Tau2)  +=  •dTau2; 


} 

*thresh2  +=  ( (♦totd)  *  leamrate  ); 

//  W1  =  W1  +  a  i  e  (total) 

initdWts(*dW  1 ,  *a_buf,  *tote,leamrate,momentum); 
(*Wl)+=  *dWl; 


ifrTDNN=0){ 

//Taul  =  Taul  +  (rate  *  i'  W1  e(total)) 

deriv(*a_buf_p,*a_buf,*Tau  1  ,tstep); 
initdTau(*dTau  1 ,  *  W 1 ,  *a_bufj),*tote,leamrate); 
(*Taul)+=  *dTaul; 


) 


•thresh  1  +=  (  (*tote)  *  leamrate  ); 


} 


avgMSE  =  MSE  /  float(total); 
accuracy  =  float(good)/float(totaI); 

char  errfh[32]; 

sprintf(errfh,"%s.ERR",name); 
ofstream  eiTf(errfii,ios;;app); 

errf «  cycleno  « "\t"  «  avg_MSE  « "\t"  «  accuracy  «  endl; 
errf.closeO; 


MSE=0.0; 

cerr  « "\n"  « "avg_MSE  is "  «  avg^MSE; 

cerr  « "\n"  «  accuracy  *  100  « "  percent  correct. \n"; 
return  accuracy; 

} 

lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

n 

I  I  ADAPTIVE  TIME  DELAY  ALGORITHM  METHODS  -  TEST  AND  RUN 

// 

float  atnn:;testO 

{ 

int  good  =  0; 
int  total  =  0; 
char  tstfii[32]; 
vecpair  v(n,p); 
vec  out(p); 

if(!loadweightsO) 

{ 

cout « "No  stored  network  to  test."; 
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return  0; 

} 

sprintf^tstih,  "%s.  TST",name); 
ifstream  tstf(tstfh,ios;:in); 


tstf  »  •minvecs; 
tstf  »  ♦maxvecs; 


//  skip  comment 
skipcmt(tstf); 


int  okay  =  TRUE; 
int  td  =  total; 


while  ( td  <  K )  //  Buffer  inputs  to  a_buf[i][tn]  for  K  timesteps 

{ 

tstf  »  v; 


i£(tstf.eof01|tstffail()) 
okay  =  FALSE; 

if  ( okay ) 

{ 

v.scale(*minvecs,*maxvecs); 
bufifer(  a_buf,  v.a  );  td-H-; 

} 

} 

while  (  okay ) 

{ 

tstf  »  v; 

if(tstfeofOI|tstffaiIO) 
okay  =  FALSE; 

if  ( okay ) 

{ 
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v.scale(*minvecs,*maxvecs), 

out=recall(*(v.a)); 

if(  ('''(v.b)-out).inaxval()  <  tolerance  ) 

gOOd-H-; 

tMSE  =  (0.5  •  ((*(v.b)-out)  *  (*(v.b)-out))); 
MSE+=tMSE; 
total++; 

} 

char  tsterrfh[32]; 

sprintf(tsterrfh,''%s.TER",name); 
ofstream  tsterrf(tsterrfii,ios;:app); 

tsterrf « total « "\t" «  *(v.b) « ”\t"  «  out 
« ”\t"  «  tMSE  «  endl; 
tMSE  =  0.0; 

tsterrf.closeO; 


} 

char  tsterrfii[32]; 

sprintf(tsterrfh,"%s.TER'',name); 
ofstream  tsterrf(tsterrfh,ios::app); 

tsterrf « "EPOCH  #  ; "  «  cycleno  «  "\t" 

« "time  avg  MSE  = "  «  MSE/total «  endl; 

MSE  =  0.0; 
tsterrf  closeO; 

cerr  « "\n"  «  float(good)/float(total)  *  100  «  "  percent  correct. \n" 
return  float(good)/float(total); 


} 


void  atnn;:run0 
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{ 

char  ifh[16],ofii[16]; 

vec  in(n),out(p),minvec(n),maxvec(n); 

if(!loadweightsO)( 

cout « "No  stored  network  to  run.\n"; 
return; 

} 

sprint£(ifii,"%s.IN",name); 

sprintf(ofh,  "%s.  OUT", name); 

cout « "Running  from "  « ifh  «  "\n"; 

cout « "Output  to  "  «  ofh  « "\n"; 

ifstream  inf(ifri,ios;;in); 

ofstream  outf(ofh,ios;;out); 

inf  »  minvec; 
inf  »  maxvec; 

//  skip  comment 
skipcmt(inf); 

int  okay  =  TRUE; 
int  td  =  0; 


while  ( td  <  K )  //  Buffer  inputs  to  a_buf[i][tn]  for  K  timesteps 

{ 

inf  » in; 

if(infeof01|inffail0) 
okay  =  FALSE; 

if  (  okay  ) 

{ 

in.  sca]e(minvec, maxvec); 
buffer(  a_buf,  &in );  td-H-; 

} 

} 

while  ( okay ) 

{ 

inf  » in; 
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i^inf.eofOllinffailO) 
olay  =  FALSE; 

if  ( okay ) 

{ 

in.  scale(nunvec,maxvec); 
outf  «  recall(in); 

} 

} 

return; 

} 


iimiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii 

n 

n  ATNN  LEVEL  INPUT/OUTPUT  METHODS: 
//  Saving  and  loading  weights,  skipping  comments. 


int  atnn::saveweightsO 

{ 

FILE  *f; 
char  fii[32]; 

sprintf(fh,"%s.WTS'',name); 

^open(fii,"wb"); 

#ifdef_TURBOC_ 
if(f  <=  0)  //  couldn't  open  the  file 
{ 

#else 

if(f  =  NULL)  //  couldn't  open  the  file 

{ 

#endif 

cerr  « "Open  of  file " «  fii « "  save  failed.Nn"; 
return  0; 

} 

else 

{ 

} 

fwrite(&cycleno,sizeof(int),  1  ,f); 

79 


ifl[  !(Wl->save(f)) 

II  !(W2.>save(f)) 

11  !(Taul->save(f)) 

II  !(Tau2->save(f)) 

11  !(threshl->sav^f)) 

II  !(thresh2->save(Q) 

) 

{ 

cerr  «  "Nothing  to  save  in  file  " «  fii « ".\n"; 

fclose(f); 

return  0; 

} 

else 

{ 

cerr  « "Saved  Weights  and  Taus  in  file  "  « fh  «  ".\n"; 

fclose(f); 

} 

//  put  matrices  into  ".MAT"  in  readable  form 
if(trace){ 

sprin^fii,"%s.MAT",name); 

ofstream  matf(fii,ios::out); 

matf « "First  weight  matrix  contains;  \n" 

«*W1 

« "First  time-delay  matrix  contains;  \n" 

«  *Taul 

« "Second  weight  matrix  contains;  \n" 

«*W2 

« "Second  time-delay  matrix  contains:  \n" 

«  *Tau2; 

} 

return  1; 


int  atnn:;loadweightsO 

{ 

FILE  *f; 
char  fh[32]; 

int  ret_val  =  FALSE; 

sprintf(fii,"%s.WTS",name); 

Mopen(fii,"rb"); 
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#ifdcf_TURBOC_ 
if(f<=0)  // couldn't  open  the  file 
{ 

#else 

ifi[f  —  NULL)  //  couldn't  open  the  file 

{ 

#endif 


} 

else 

{ 

ret_val  =  TRUE; 

} 

if  ( ret_val ) 

{ 

fread(&cycleno,sizeof(int),  1  ,f); 

if(  !(Wl->load(f)) 

II  !(W2->load(f)) 

II  !(Taul->load(f)) 

II  !(Tau2->load(f)) 

11  !(threshl->load(f)) 
ll!(thresh2->!oad(f))) 

{ 

rfct_val  =  FALSE; 

} 


else 

{ 

} 


ret_val  =  TRUE; 
fclose(f); 


return  retval; 

} 
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iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiin 

//NET.CC 

//  Source  code  for  abstract  neural  network  base  class 
#include  "net.hpp" 

#ifiidef_TURBOC_ 

#defme_MAX_PATH  16 
#endif 


lllllllllllllllllllllllllllllllllll 

U  Parameter  table  functions 

int  readpanns(int  n,PARM  *p,char  ^name) 

{ 

char  fh[16]; 

sprintf(fh,  ’'%s.DEF'*,name); 
ifstream  def(fii,ios::in); 
if(!def){ 

cerr « "Failed  to  find  definition  file.\n"; 
return  0; 

) 

while  (readparm(def,n,p)  &&  Idef.eofO) 
return  0; 

} 

istreani&  readparm(istreani&  s,int  nopanns,PARM  *p) 

//  This  streams  extraction  operator  takes  input  from  network  definition  file 
//  for  one  definition  parameter.  It  reads  in  the  name  of  the  parameter 
//  and  then  looks  up  which  entry  in  the  parameter  table  to  instantiate 
//  with  a  value. 

{ 

char  keyword[NAMELEN],val[16]; 
s  »  keyword; 

if(!s  II  s.eofQ  ||  s  failQ)  //  end  of  file  or  failure  to  read  keyword 
return  s; 

for(int  i=0;i<noparms;i-H-) 
if( !  strcmp(keyword,p[i] .  name)) 
break; 
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ifl[i  <  nopamis)  //  recognized  paramet^ 
switch(p[i].typc){ 
case  string:  s  »  p[i].vai.s;  break; 
case  integer;  s  »  p[i].val.i;  break; 
case  real,  s  »  p[i].val.f;  break; 

} 

else 

s  »  val; 
return  s; 

} 


/////////////////////////////^^^^^ 

//  NET  CLASS 

//  Abstract  neural  net  class  methods 
// 

net;:net(char  ♦s) 

{ 

char  fii[16]; 

name=new  char[strlen(s)+l]; 

strcpy(name,s); 

const  NOPARMS=5; 


FARM  panns[NOPARMS]; 

strcpy(panns[0].name,  "INPUTS"); 
parms[0].type  =  integer; 

strcpy(panns[l].name,  "OUTPUTS"); 
parms[l].type  =  integer; 

strcpy(parms[2].name,  "RATE"); 
parms[2].type  =  real; 

strcpy(parms[3],name,  "DECAY"); 
parmsPJ.type  =  real; 

strcpy(panns[4],name,  "ITERS"); 
panns[4].type  =  integer; 
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readparms(NOPARMS,panns,name); 
n  =  parnis[0].val.i; 

p  =  parTns[l].val.i; 

iearnrate  =  panns[2].val.f; 
decayrate  =  panns[3].val.f; 
iters  =  parms[4].val.i; 
return; 

} 

net::~netO 

{ 

delete  name; 

} 

void  net:;trainO 

{ 

ifstream  *s; 
float  ret; 

charfhLMAX_PATH]; 

sprintf(fii,"%s.FCT'',name); 


i^  loadweightsQ  ) 

{ 

cerr  «  "Training  from  stored  weights.\n"; 

} 

else 

{ 

cerr « "Training  from  new  weights.Nn"; 

} 

#ifdef_TURBOC_ 

cerr  « "Training  from  " «  fii « Press  any  key  to  stop.\n"; 
#else 

cerr  « "Training  from " « fh  « ".  Press  ESC  key  to  stopAn"; 
#endif 

int  okay  =  TRUE; 
while(  okay ) 

{ 

s=new  ifstream(fh,ios;:in); 


if(!*s){ 
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cerr  « "Failed  to  open  fact  fileAn"; 
return; 

} 

cerr  «  "Cycle "  « -H-cydeno  « 
if  (  cycleno  >=  iters )  {  okay  =  FALSE; } 

ret=cycle(*s); 
delete  s; 

#ifdef_TURBOC_ 
if(ret>=1.0  II  kbhitO) 

{ 

#el5e 

if(ret>=1.0  II  getbutton(  ESCKEY  ) ) 

{ 

#endif 

cerr  «  "Training  suspended  at " «  cycleno  «  "  cycles. \n"; 
okay  =  FALSE; 

} 

} 

saveweightsO; 

return; 

} 


float  net;:cycle(istream&  s) 

{ 

vecpair  v(n,p); 
int  good  =  0; 
int  total  =  0; 

skipcmt(s); 
for(;;){ 
s  »  v; 

if(s.eofOI|s.failO)break; 

if(encode(v)) 

good-H-; 

total-H-; 

} 
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return  float(good)/float(total); 

} 

int  net::skipcmt(istream&  inf) 

{ 

int  c; 

inf.unsetf(inf.  skipws); 

if(inf.peekO=':'){ 

do{ 

c=inf.getO; 
if(c<0) 
return  0; 

}  while(  (c!=0xd)  &&  (c!=0xa)  ); 

inf.setf(inf.skipws); 

return  1; 

} 

else( 

inf.  setf(in£  skipws); 
return  0; 

} 

} 


float  net;:test0 

{ 

int  good  =  0; 
int  total  =  0; 
char  tstfh[32]; 
vecpair  v(n,p); 
vec  out(p); 

if(!loadweightsO) 

{ 

cout « "No  stored  network  to  test."; 
return  0; 

} 

sprintf(tstfh,  "%s.  TST",name); 
ifstream  tstf(tstfh,ios;  ;in); 

//  skip  comment 
skipcmt(tstf); 


for(;;){ 
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if( !  (tstfi»v))break; 
out=recall(*(v.a)); 

if(  (*(v.b)-out).maxvalO  <  tolerance  ) 
good-H-; 
tOtal++; 

} 

cerr  « "\n"  «  float(goodVfIoat(total)  *  100  «  "  percent  correctAn"; 
return  fIoat(good)/float(total); 

} 

void  net:;run() 

{ 

char  ifii[16],ofh[16]; 

//  int  c; 

vec  in(n),out(p); 
if(!loadweightsO)( 

cout « ''No  stored  network  to  runAn"; 
return; 

} 

sprintf(ifh, ''%s.  IN'',name); 

sprintf(ofii,''%s.OUT",name); 

cout « "Running  from "  « ifh  « "\n"; 

cout « "Output  to  "  «  ofii « "\n"; 

ifstream  infrifh,ios::in); 

ofstream  outf(ofh,ios;:out); 

skipcmt(inf); 

for(;;){ 

if( !  (inC»in))break; ; 

if(!inf  II  inf.eofOll  inf.fail())break; 

outf  «  recall(in); 

} 

return; 

} 
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iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiim 

//VECMAT.CC 

//  vector  and  matrix  class  methods 

//  Author;  Capt  James  Gainey,  GEO-93D  Last  Modified;  10  Sep  93 
//  Modified  firom  VECMAT.CPP  Adam  Blum  (1990) 


#include  <vecmat.hpp> 


IIHIIIIUIIIIIIIIIIIIIIIIIIIIUHl 

I  I  vector  class  member  functions 

vec;;vec(int  size,int  val) 

{ 

V  =  new  float[n='size]; 
for(int  i=0;i<n;i-H-) 
v[i]=val; 

}  //  constructor 

vec;;~vec0  {  delete  v;}  //  destructor 
vec;;vec(vec&  vl)  //  copy-initializer 

{ 

v=®new  float[n=vl.n]; 
for(int  i=0;i<n;i-H-) 
v[i]=vl.v[i]; 

} 

vec&  vec;;operator=(const  vec&  vl) 

{ 

delete  v; 

v=new  float[n=vl.n]; 
for(int  i=0;i<n;i++) 
v[i]=vl.v[i]; 
return  *this; 

} 

vec  vec;;operator+<const  vec&  vl) 

{ 

vec  sum(vl.n); 
for(int  i=^;i<vl.n;i++) 

sum.v[i]=vl  .v[i]+v[i]; 
return  sum; 

} 
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vec  vec;:opmtor+(const  float  d) 

{ 

vec  sum(n); 
for(mt  i=^;i<n;i++) 
sum.v[i]=v[i]+d; 
return  sum; 


vec&  vec;:operatorf=(const  vec&  vl) 

{ 

for(int  i=0;i<vl.n;i-H-) 
v[i]+=vl.v[i]; 
return  *this; 

} 

float  vec:;operator*(const  vec&  vl)  //  dot-product 

{ 

float  sum=0; 

for(int  i=0;i<min(n,vl.n);i++) 
sum+=(vl  .v[i]*v[i]); 
return  sum; 


int  vec::operator=(const  vec&  vl) 

{ 

if(vl.n!=n)retum  0; 
for(int  i=0;i<min(n,vl.n);i-H-)( 
if(vl.v[i]!=v[i]){ 
return  0; 

} 

} 

return  1; 


float  vec;:operator[](int  x) 

{ 

if(x<engthO  &&  x>=0) 
return  v[x]; 
else 

cerr  « "vec  index  out  of  range"; 
return  0; 
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int  vec::lengthO{retum  n;}  //  length  method 

vec&  vec:;garbIe(float  noise)  //  corrupt  vector  w/random  noise 

{ 

time_t  t; 
time(&t); 

srand((unsigned)t); 
for(int  i=0;i<n;i-H-){ 

i£((rand0%10)/10<noise) 

v[i]=l-v[i]; 

} 

return  *this; 


vec&  vec;  inormalizeO  //  normalize  by  length 

{ 

for(int  i=0;i<n;i-H-) 
v[i]/=n; 
return  *this; 

} 

vec&  vec;:normalizeonO  //normalize  by  nonzero  elements 

{ 

int  on=0; 

for(int  i=0;i<n;i-H-) 

mi]) 

on++; 
for(i=0;i<n;i-H-) 
v[i]/=on; 
return  •this; 


vec&  vec::randomize(float  range) 

{ 

time_t  t; 
im  pct,val,md; 
if(range){ 
time(&t); 

srand((unsigned)t); 

} 

for(int  i=0;i<n;i-H-){ 
md=randO; 

pct=(int)  (range  •  100.0); 
val=  md  %  pet; 
v[i]=  (float)  val  / 100.0 ; 
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ifl[range<0) 

v[i]  =  fabs(range)  -  (v[i]  *  2.0); 

} 

return  *this; 


float  vecr.maxvalO  H  returns  maximum  ABSOLUTE  value 

{ 

float  mx=0; 
for(int  i=0;i<n;i-H-) 
if(fabs(v[i])>^){ 
mx=fabs(v[i]); 

} 

return  mx; 

} 

vec&  vec::scale(vec&  minvec,vec&  maxvec) 

{ 

for(int  i=0;i<n;i++){ 
if(v[i]<minvec.v[i]) 
v[i]=0; 

else  if(v[i]>maxvec.v[i]) 

else  ifl;(maxvec.v[i]-minvec.v[i])=0) 
v[i]=l; 
else 

v[i]=(v[i]-minvec.v[i]y(maxvec.v[i]-minvec.v[i]) 

} 

return  *this; 


float  vec;:d_logisticO  //  returns  vec  *  (1-vec) 

{ 

float  sum=0.0; 
for(int  i=0;i<n;i++) 

sum+=(v[i]*(l-v[i])); 
return  sum; 

} 

//  Euclidean  distance  function  ||A-B|| 
float  vec:;distance(vec&  A) 

{ 

float  sum=0,d; 
for(int  i=0;i<n;i-H-){ 
d=v[i]-A.v[i]; 
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if(d)suin+=pow(d,2); 

} 

return  sum?pow(suni,0.S):0; 


//  index  of  the  highest  item  in  vector 
int  vec;:maxindexO 
{ 

intidx,i; 
float  mx; 

for(i=0,nix=-INT_MAX;i<n;i-H-) 

if(v[i]>mx){ 

mx=v[i]; 

idx=i; 

} 

return  idx; 


double  logistic(double  activation) 

{ 

/*  These  underflow  limits  were  copied  from  McClelland's  bp  implementation. 
We  had  problems  with  underflow  with  numbers  that  should  have  been 
smaU  enough  in  magnitude.  McClelland  seems  to  have  encountered  this 
and  established  the  numbers  below  as  reasonable  limits.  -  AB  */ 

if(activation>l  1 .5 129) 
return  0.99999; 

if(activation<-l  1.5129) 
return  0.00001; 

return  1.0/(1.0+exp(-activation)); 

} 

vec&  vec::getstr(char  *s) 

{ 

for(int  i=0;i<MAXVEC&&s[i];i-H-){ 
if(isalpha(s[i])) 

v[toupp^s[i])-'A']=l ; 

} 

return  ♦this; 


void  vec:;putstr(char  *s) 

{ 


int  ct=0; 
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for(int  i=0;i<26;i-H-) 
if(v[i]>0.9) 

s[ct-H-]='A'+i; 

} 

vec  vec;:operator-(const  vec&  vl) 

{ 

vec  difB[n); 
for(int  i=0;i<n;i-H-) 

diff.v[i]=v[i]-vl  .v[i]; 
return  diff; 


vec  vec:  :operator-(const  float  d)  //  subtraction  of  constant 

{ 

vec  dif^n); 
for(int  i=0;i<n;i-H-) 
diflf.v[i]=v[i]-d; 
return  diflf; 


vec  vec::operator*(float  c) 

{ 

vec  prod(lengthO); 
for(int  i=0;i<prod.n;i++) 
prod.v[i]=v[i]*c; 
return  prod; 


vec&  vec::operator*=(float  c) 

{ 

for(int  i=0;i<n;i-H-) 
v[i]*=c; 
return  *this; 

}  //  vector  multiply  by  constant 


const  SCALE=1; 

vec&  vec::sigmoid(vec&  thresh) 

//  this  is  the  sigmoid  activation  function  we  have  chosen  for 
//  our  backprop  implementation.  It  happens  to  use  the  logistic 
//function:  l/(l+e^-x) 

{ 


for(int  i=0;i<n;i-H-) 
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v[i3  =  (float)  logistic(  (double)  (SCALE  *  (v[i]+thredi[i])) ); 
return  •this; 

} 

vec&  vec::set(int  i,float  f) 

{ 

v[iK 
return  •this; 


istream&  operator»(istream&  s,vec&  vl) 

//  fomuU:  list  of  floating  point  numbers  followed  by 

{ 

float  d;int  i=0,c; 

for(;;){ 


s»d; 

if(s.eofO) 
return  s; 

if(s.fiulO){ 

s.clearO; 

do 

c=s.getO; 
while(c!- &&  c); 
returns; 

} 

vl.v[i-H-]=d; 

if(i=vl.n){ 

do 

c=s.getO; 
while(c!-,'); 
return  s; 

} 

} 

} 


ostream&  operator«(ostreani&  s,vec&  vl) 

//  format:  list  of  floating  point  numbers  followed  by 

{ 

s.precision(6); 
for(int  i=0;i<vl.n;i-H-) 
s«vl[i]«" "; 
s«V; 
returns; 
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} 


int  vec::save(FILE  *f)  II  save  binary  values  of  matrix  to  specified  file 

{ 

int  successF=l; 
for(int  i*0;i<n;i-H-) 

if(fwrite(&(v[i]),sizeof(v[i]),l,f)  <  1) 
success=0; 
return  success; 

} 

int  vec;  ;load(FILE  *f)  //  load  binary  values  of  matrix  firom  specified  file 

{ 

int  success=l; 
for(int  i=0;i<n;i-H-) 

if(fi-ead(&(v[i]),sizeof(v[0]),l,f)  <  1) 
success=0; 
return  success; 

} 


lllllllllllllllllllllllllllllll 

//matrix  member  functions 
matrix;  :matrix(int  n,int  z,float  range) 

{ 

int  i  j,md;time_t  t; 
int  pet; 

m=ncw  float  *[n]; 
ifl;range){ 
time(&t); 

srand((unsigned)t); 

} 

for(i=0;i<n;i-H-){ 

m[i]=new  float[z]; 

for(j=0j<zj++){ 

if(range){ 

md=randO; 

pct=(int)  (range  "■  100.0); 
m[i][j]=  (floatXmd  %  pet)  /  100.0 ; 
if(range<0) 

m[i]lj]  =  fabs(range)  -  (m[i]D]  *  2.0); 

} 

else 
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m[i][j]=0; 

) 

} 

r=n; 

c*2; 

} 

matnx:  :inatrix(mt  n,int  p, float  value, float  range) 

{ 

intij; 

i*int(range); 
in=new  float  *[n]; 
for(i=0;i<n;i-H-){ 

m[i]=new  float[p]; 
forO=Oj<pj++) 
m[0[j]=value; 

} 

n=n; 

c=p; 

} 

matrix:  :matrix(int  n,int  p,char  *fii) 

{ 

inti; 

//intj.md; 

//time_t  t; 

m=new  float  *[n]; 
for(i=0;i<n;i++){ 
m[i]=new  float[p]; 

} 

r=n; 

c=p; 

ifstream  in(fh,ios::in); 
in  »  *this; 


matrix:  :matrix(const  veq)air&  vp) 

{ 

intj; 

r=vp.a->lengthO; 
c=vp.b->lengthO; 
m=new  float  *[r]; 
for(int  i=0;i<r;i-H-){ 
m[i]=new  float[c]; 
for(j=0j<cj-H-) 
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m[i]U]=((vpa)->v[i])*((vp.b)->v[j]); 


} 

}//  constructor 

matrix:  :inatrix(vec&  vl,vec&  v2) 

{ 

intj; 

r=vl.lengthO; 
c=v2.1ength0; 
m=new  float  *[r]; 
for(int  i=0;i<r;i-H-){ 
m[i]=new  float[c]; 
forO=Oj<cJ-H-) 

m[i][j]=vl.v[i]*v2.v[j]; 

} 

}//  constructor 

matrix:  ;matrix(matrix&  ml)  //  copy-initialiMr 

{ 

r=ml.r; 

c=ml.c; 

m=new  float  *[r]; 
for(int  i=0;i<r;i-H-){ 
m[i]=®new  float[c]; 

for(intj=0j<cj++) 

m[i]0]=ml.m[i]01; 

} 

} 

matrix:  :~matrixO 

{ 

delete  []m; 

)  //  destructor 

matrix&  matrix:  :operator=(const  vecp^&  vp) 

{ 

int  j;double  d; 
r=vp.a->lengthO; 
c=vp.b->lengthO; 
for(int  i=0;i<r;i-H-){ 

for(j=0j<cj++){ 

d=((vp.a)->v[i])*((vp.b>>vO]); 

ni[i]DHfloat)d; 

} 


} 
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return  •this; 

} 

matrix&  matrix:  ;operatop=(const  matrix^  ml) 

{ 

for(int  i=0;i<r;i-H-) 
delete  m[i]; 
r=ml.r; 
c=ml.c; 

m=new  float*[r]; 
for(i=0;i<r;i-i-+){ 

m[i]=new  float[c]; 
for(intj=OJ<rj-H-) 

m[i][j]=ml.m[i]0]; 

} 

return  *this; 


matrix  matrix; :operatort-(const  matrix&  ml) 

{ 

intij; 

matrix  sum(r,c); 
for(i=0;i<r;i-H-) 

for(j=OJ<rj++) 

sum.m[i][j]=ml  .m[i]D]+m[i)[j]; 
return  sum; 

} 


matrix&  matrix:  :operator*(const  float  d) 

{ 

int  i  j; 

for(i=0;i<r;i-H-) 

for(j=0J<cj-H-) 
m[i]G]*=d; 
return  •this; 


vec  matrix:  :colslice(int  col) 

{ 

vec  temp(r); 
for(int  i=0;i<r;i-H-) 

temp.  v[i]=m[i]  [col] ; 
return  temp; 

} 
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vec  matrix;  :rowsUce(int  row) 

{ 

vec  temp(c); 
for(int  i=0;i<c;i-H-) 

temp.v[i]=m[row][i]; 
return  temp; 

} 

void  matrix:  ;insertcol(vec&  v,int  col) 

{ 

for(int  i=0;i<v.n;i++) 
m[i][col]=v.v[i]; 

) 

void  matrix:  :insertrow(vec&  v,int  row) 

{ 

for(int  i=0;i<v.n;i-H-) 
m[row][i]=v.v[i]; 

} 

int  matrix:  :depthO{retum  r;} 
int  matrix:  :widthO{  return  c;} 

float  matrix;  :getval(int  row,int  col) 

{ 

return  m[row][col]; 

} 

void  matrix;  :setval(int  row,int  col,float  val) 

{ 

m[row][col]  =  val; 

} 

int  matrix:  :closestcol(vec&  v) 

{ 

int  mincol; 
float  d; 

float  mindist=INT_MAX; 
vec  w(r); 

for(int  i=0;i<c;i++){ 
w=colslice(i); 

if(  (d=v.distance(w))  <  mindist){ 
mindist=d; 
rnincoH; 

} 


} 

return  mincol; 


} 

int  matrix;  ;closestrow(vec&  v) 

{ 

int  minrow; 
float  d; 

float  mind;st=INT_MAX; 
vec  w(c); 

for(int  i=0;i<r;i-H-){ 
w=rowslice(i); 

if(  (d=v.distance(w))  <  mindist){ 
mindist=d; 
minrow=i; 

} 

} 

return  minrow; 

} 

int  matrix:  ;closestrow(vec&  v,int  ’"wins.float  scaling) 

{ 

int  minrow; 
float  d; 

float  mindist=INT_MAX; 
vec  w(c); 

for(int  i=0;i<r,i++){ 
w=rowslice(i); 
d=v.distance(w); 
d*=(l+((float)wins[i]*scaling)); 
if(  d  <  mindist){ 
mindist=d; 
minrow=i; 

} 

} 

return  minrow; 


int  matrix;  ;save(FILE  *f)  //  save  binary  values  of  matrix  to  specified  file 

{ 

int  success=l; 
for(int  i=0;i<r;i++) 
for(int  j=0  j<cj++) 

i^fwrite(&(m[i][j]),sizeof(m[0][0]),l,f)  <  1) 
success=0; 

return  success; 

} 
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int  matrix;  ;load(FILE  *f)  //  load  binary  values  of  matrix  from  specified  file 
t 

int  success=l; 
for(int  i=0;i<r;i-H-) 

for(int  j*0  j<c  J-H-) 

if(fread(&(m[i]a]),sizeof(m[03[0]),l,f)  <  1) 
success=0; 

return  success; 

} 

#ifdef_TURBOC_ 

int  _Cdecl  intcmp(const  void*  il, const  void  *i2) 

#else 

int  intcmp(const  void*  il, const  void  *i2) 

#endif 

{ 

if(*(int  *)il  >  *(int  *)i2) 
return  1; 

i£(*(int  *)il  <  *(int  *)i2) 
return -1; 
return  0; 

} 


matrix&  matrix:  :operatoi+=(const  matiix&  ml) 

{ 

intij; 

for(i=0;i<r&&i<ml  .r;i++) 

for(j=0  j<c&&j<ml  .cJ-H-) 

m[i][j3+=(mlm[i]D]); 

return  *this; 


matrix&  matrix;  ;operator*=(const  float  d) 

{ 

int  ij; 

for(i=0;i<r;i-H-) 

for(j=0j<cj++) 
m[i]D]*=d; 
return  *this; 


vec  matrix:  :operator*(vec&  vl) 

{ 
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vec  teinp(vl.n=====T?c;r),temp2(vl.n===i?r:c); 
for(int  i=0;i<((vl.n==T)?c;r);i-H-){ 
if(vl.n==r) 

temp2=KX}lslice(i); 

else 

temp2=rowslice(i); 
temp.v[i]=vl  *temp2; 

} 

return  temp; 


void  matrix: ;iiiitva]s(const  vec&  vl, const  vec&  v2, const  float  rate,  const  float 
momentum) 

{ 

intj; 

for(int  i=0;i<r,i-H-) 
for(j=OJ<cJ-H-) 

roWG]=K™[i]D]*roo™®ntum)+((vl.v[i]*v2.v[j])*rate); 


ostream&  operator«(ostream&  s,matrix&  ml) 
//  print  a  matrix 
{ 

for(int  i=0;i<ml.r;i-H-){ 
for(int  j=0J<ml  .c  j++){ 
s«ml.m[i]G]«""; 

} 

s  « "\n"; 

} 

returns; 


istream&  operator»(istream&  s,matrix&  ml) 

{ 

for(int  i=0;i<ml.r;i-H-){ 

for(int  j=0  j<ml  .cJ++){ 
s»ml.m[i]G]; 

} 

} 

return  s; 


} 

llllllllllllllllllllllllllllllllllllllllll 
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//veq)air  member  functions 


////////////////// 

//  constructors 


vecpair;;vecpair(int  n,int  p,int  val) 

{ 

a=new  vec(n,val);b=new  vec(p,val); 

) 

vecpair:;vecpair(vec&  A,vec&  B) 

{ 

a=new  vec(A.lengthO); 

♦a=A; 

b=new  vec(B.lengthO); 

*b=B; 


vecpair::vecpair(const  vecpair&  AB)  //  coj^-initializer 

{ 

a=new  vec((AB.a>>lengthO); 
b=new  vec((AB.b)->lengthO); 

•a=*(AB.a); 

•b=*(AB.b); 


vecpair;:~vecpairO  { 
delete  a;  delete  b; 

}  //  destructor 

vecpair&  vecpair::operator=(const  vecpair&  vl) 

{ 

*a=*(vl.a); 

•b=*(vl.b); 
return  *this; 

} 

vecpair&  vecpair::scale(vecpair&  minvecs,vecpair&  maxvecs) 

{ 

a->scale(*(minvecs.a),*(maxvecs.a)); 
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} 


b->scale(*(ininvecs.b),*(i!UKvecs.b)); 
return  *this; 


int  veq)air::operator==(const  veq)air&  vl) 

{ 

return  (*a  =  *(vl.a))  &&  (•b  ==  *(vl.b)); 

} 


istreani&  operator»(istream&  s,veq)air&  vl) 
//  input  a  vector  pair 
{ 

s»*(vl  .a)»*(vl  .b); 
returns; 

} 

ostreanv&  operator«(ostream&  s,vecpair  &vl) 
//  print  a  vector  pair 
{ 

return  s«*(vl.a)«*(vl.b)«"\n"; 

} 


lllllllllllllllllllllllllllllll 

H  3d  matrix  member  functions 

//  constructors 

mtix3d  ::  mtrx3d(int  n,int  p,int  m,float  range) 

{ 

int  ij,k,md;time_t  t; 
int  pet; 

m3d=new  float  **[n]; 
if(range){ 
tim^&t); 

srand((unsigned)t); 

} 

for(i=0;i<n;i-H-){ 
m3d[i]=new  float*[p]; 
for(j=0j<pJ-H-){ 
m3d[i]|j]=new  float[m]; 

for(k=0;k<m;k++){ 

if(range){ 
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md=randO; 

pct=(int)  (range  •  100.0); 
m3d[i][j][k]=  (floatXmd  %  pet)  / 100.0 ; 
if(range<0) 

m3d[i]D](kl  *  febs(rangc)  -  (in3d[i]0][k]  •  2.0); 

> 

else 

in3d[i]D][k]=0; 

} 

} 

} 

r=n; 

c=p; 

r=m; 

) 

mtix3d::mtrx3d(int  n,int  p,int  in,int  irange) 

{ //  constructor 
int  value,ij,k; 
m3d=new  float  **[n]; 
for(i=0;i<n;i-H-){ 
ni3d[i]*new  float*[p]; 
for(j=Oj<pj++){ 
m3d[i]|j]==ncw  float[m]; 
vBlue=0; 

for(k=0;k<ni;k-H-){ 

i^irange=m){ 

m3d[i](j](k]=value; 

value++; 

} 

else 

m3d[i]01[kl=O; 

} 

} 

} 

r=n; 

c=p; 

z=m; 

} 

mtrx3d::mtrx3d(int  n,int  p,int  ni,float  value,3oat  rarge) 

{ //  constructor 
intij,k; 
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i>=iiit(range); 
in3d=new  float  ‘‘[n]; 

in3d[i]=iww  float*[p]; 
for(j=Oj<pj-H-){ 
iii3d[i][j]=new  float[m]; 
for(l^;k<in;k-H-) 
ni3d[i][j][k]=value; 

} 

} 

r=n; 

c*p; 

ZFm; 

} 

iiitrx3d::nitrx3d(int  n,ifit  p,int  in,char  *fii) 
{ //  constructor 

//int  k,md; 

//timc_t  t; 

ni3d=new  float  **[0]; 
for(i=0;i<n;i'H-){ 
m3d[i]=ncw  float*[p]; 
for(j*0j<ny++){ 

in3d[i][j]=new  float[m]; 

} 

} 

r=n; 

c=p; 

2=m; 

iflstream  in(fii,ios;:in); 
in  »  •this; 


mtrx3d::mtrx3d(mtix3d&  in3dl)  //  copy-initializer 

{ 

r=m3dl.r; 

c=m3dl.c; 

z=m3dl.2; 

m3d=new  float  ••[r]; 
for(int  i=0;i<r;i-H-){ 
m3d[i]=new  float*[c]; 
for(int  j=0j<cj++){ 
ni3d[i]D]=new  float[z]; 
for(int  k=*0;k<z;k-H-) 
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in3d[i]0][k]=in3dl.in3d[i]0][k]; 

} 

} 


mtrx3d;  :~intrx3dO 

{ 


delete  []in3d; 

}  //  destructor 

int  tntix3d:;depth0{retum  r;} 
int  intrx3d;:width0{retum  c;} 
int  mtrx3d::height0{retum  z;} 

intrx3d&  mtrx3d;;operator=(const  nitrx3d&  ni3dl) 

{ 

for(int  i=0;i<r;i-H-) 
for(int  j=0  J<rj-H-) 
delete  m3d[i][j]; 
r=m3dl.r; 
c=m3dl.c; 
z=m3dl.2; 
m3d=ncw  float**[r]; 
for(i=0;i<r;i++){ 

ni3d[i]=ncw  float*[c]; 
for(intj=Oj<rj++){ 
m3d[i]|j]=new  float[z]; 
for(int  lc=0;k<r;k++) 
m3d[i]D][k]=in3dl  .m3d[i]D][k]; 

} 

} 

return  ♦this; 


mtrx3d  mtrx3d::operatorf(const  mtrx3d&  mSdl) 

{ 

intij.k; 

mtrx3d  sum(r,c,z); 
for(i=0;i<r;i-H-) 
for(j=0J<r;j-H-) 
for(k=0;k<r;k++) 

8um.in3d[i]D][k]=in3dl.m3d[i][i][k]+in3d[i][j][k]; 
return  sum; 


) 
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int  mtix3d::save(FILE  *f)  //  save  binary  values  of  3d  matrix  to  specified  file 

{ 

int  success^!; 
for(int  i=0;i<r,i++) 
for(intj*0j<cj++) 
for(int  k*0;lc<2;lc++) 

ifi[fwrite(&(in3d[i]D][k]).sizeof(in3d[0][0][0]),l,f)  <  1) 
succmoN); 

return  success; 


int  mtrx3d::load(FILE  *f)  H  load  binary  values  of  3d  matrix  fix>m  specified  file 

{ 

int  success^l; 
for(int  i*0;i<r,i++) 
for(int  j=0  j<c  j-H-) 
fbr(int  k*0;lc<z;k-H-) 

if(fiead(&(m3d[i][j][k]),sizeof(m3d[0][0][0]).l,f)  <  1) 
succMS=0; 

return  success; 


float  mtix3d:;getval(int  row,im  col,int  z) 

{ 

return  m3d[row][col][z]; 

} 

void  mtrx3d::setval(int  row,int  col,int  z,float  val) 

{ 

m3d[row][col][z]  =  val; 

} 

mtrx3d&  mtrx3d::operatorf^const  mtrx3d&  m3dl) 

{ 

imij,k; 

for(i=0;i<r&&i<m3dl  .r;i-H-) 
for(j^;j<c&&j<m3dl  x  j-H-) 
for(k=0;k<z&&k<m3d  1  .z;k++) 
m3d[i][j][k]+=(m3dl.m3d[i]D][k]); 
return  *this; 
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} 


mtrx3d&  mtrx3d:;operator*(coiist  float  d) 

{ 

for(i=04<r;i-H-) 
for(j=Oj<cj-H-) 
for(k=0;k<z;lc++) 
m3d[i][j][k]*=d; 
return  *this; 


fntrx3d&  mtrx3d:;operator*=(const  float  d) 

{ 

int  i  j,k; 

for(i*0;i<r;i++) 

for(j=Oj<cj-H-) 

for(k=0;k<z;k-H-) 

m3d[i]D][k]*=d; 
return  *this; 


ostreain&  operator«(ostreain&  s,mtrx3d&  in3dl) 
//  print  a  3d  nuttrix 
{ 

for(int  i=0;i<ni3dl.r;i-H-){ 
for(int  j=0  j<ni3dl  .c 
for(int  k=0;k<m3dl.z;k++){ 
s «  m3dl.!n3d[i]D][k] « " 

) 

} 

s  « "\n"; 

} 

return  s; 

} 

istreain&  operator»(istreani&  s,mtrx3d&  ni3dl) 

{ 

for(int  i=0;i<m3dl.r;i-H-) 
for(int  j=0j<m3dl  .cij++) 
for(int  k=0;k<m3dl.c;k*H-) 
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s  »  m3dl.in3d[i]lj][k]; 
returns; 

) 


no 


///////////////////////////////////^^^^^^ 

//ATNN.H 

//  Header  file  for  Adaptive  Time  Delay  Neural  Net  implementation 
//  Developed  with  Turbo  C++  3.0 

//  Author:  Capt  James  Gainey,  GEO-93D  Last  Modified:  10  Sep  93 


#include''net.h'' 


class  atim:  public  net  {  //  adaptive  time  delay  netwoilc  derived  fi'om 
private: 

intq;  // size  of  hidden  layer 

int  K;  //  max  #  of  time  delays  (range  of  taus) 
mtrx3d  •W1,*W2;  // synapse  weight  matrices 
mtrx3d  *dWl,*dW2;  //  used  to  compute  changes  to  matrices 
mtrx3d  *Taul,*Tau2,  //  synapse  time  delay  matrices 
mti:x3d  *dTaul,*dTau2;  //  used  to  compute  changes  to  time  delay  matrices 

matrix  ♦a_buf,*hjbuf;  /^ufim  input  to  nodes  over  all  time-delays 
matrix  •a_bufj),*h_bufj);  /^ufFer  for  derivative  of  inputs  to 

//nodes  used  to  compute  delta  tau 
vec  *h,*o,*d,*e,*threshl,*thresh2,*in; 


int  epoch,  TDNN; 

vec  *totd,*tote; 

vecpair  *minvecs,*nMxvecs; 

float  tstep,momentutn,initrange,leamrate2,MSE,avg_MSE,accuracy,tMSE; 

//  private  member  functions 
//  these  are  helper  member  functions 
void  bufrer(matrix  •ml,vec  "'vl); 
void  deriv(matiix&  m2,matrix&  ml,mtrx3d&  m3dl, 
const  float  tstep=1.0f); 

vec  propagate(vec&  vl,mtrx3d&  m3dl,mtrx3d&  m3d2,matrix&  ml); 
vec  backprop(vec&  vl,vec&  v2,mtrx3d&  m3dl); 

void  initdWts(mtrx3d&  m3d,matiix&  ml,const  vec&  vl, 
const  float  rate=1.0,const  float  momentum=0.0); 
void  initdTau(mtrx3d&  m3dl,mtrx3d&  m3d2,matrix&  ml,const  vec&  vl, 
const  float  rate=1.0); 


int  saveweightsO; 
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int  loadweightsO; 
float  cycle(istreain&  s); 


public; 

//  public  member  functions 

atnn(char  *s);  //  constructs  based  on  <name>.DEF  file 

~atnnO;  H  destructor 

//  override  pure  virtual  functions 
int  encode(vecpair&  v);  //  store  one  pattern  pair 
vec  recall(vec  &v);  //  recall  an  output  pattern  given  an  input 

float  testQ; 
void  runO; 

}; 
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iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiniiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii^ 

//NET.H 

//  Header  file  for  abstract  neural  network  base  class 

//  To  be  used  as  parent  to  specific  neural  network  implementations. 

//  The  encode  and  recall  methods  are  defined  as  pure  virtual  functions 
//  making  this  an  abstract  class  than  can  never  be  instantiated. 

//  Details  of  encode  and  recall  must  depend  on  the  topology 
//  itself  However  the  methods  "train",  "test",  and  "run" 

//  can  be  defined  since  they  are  substantively  the  same  for  each 
//  of  the  classes.  The  constructor  can  be  defined  and  will  be  used 
//  by  child  classes  in  their  own  constructors  to  instantiate 
//  common  elements  of  derived  classes. 

#include  "vecmat.h" 

//  parameter  class  used  to  point  to  variable  to  be  initialized 
//  and  specify  string  to  be  used  in  definition  file  to  initialize  it 

enum  vartype  {real,integer,string}; 
const  NAMELEN=16; 

typedef  struct  { 

char  name[16];  /*  string  to  init  value  •/ 
vartype  type; 
union  { 
char  s[8]; 
float  f; 
inti; 

}  val; 

}PARM; 

istream&  readparm(istream&  s,int  nopanm,PARM  *p); 
int  readparms(int  n,PARM  *p,char  *name); 

llllllllllllllllllllllllllllllltllllllllllllllllltlllllllll 
n  NET  CLASS 
// 

class  net  { 
protected: 

char  *name;  //  string  used  as  basename  for  files 
int  n;  //  size  of  input  layer 
int  p;  //  size  of  output  layer 

float  leamrate;  //  learning  rate  (defined  as  1  where  not  gradual) 
float  decayrate;  //  decay  (default  constructed  zero  if  not  applicable) 
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float  tolerance; 
int  iters; 
int  cycleno; 

vecpair  •minvecs,*maxvecs; 

//  weight  saving  methods  since  we  don't  know  topology 
//  they  must  be  pure  virtual 
virtual  int  saveweights(void)  =  0; 
virtual  int  loadweights(void)  =  0; 
int  skipcmt(istream&  s); 
public; 

enum  parmtype  {inputs,outputs,leam,decay}; 

netOO; 

net(char  *8); 

net(char  *s,int  noparms,PARM  *p); 

~net0; 

//  encode  and  recall  and  "pure  virtual"  which  nuikes  the 

//  the  net  class  abstract 

virtual  int  encode(vecpair&  v)  =  0; 

virtual  vec  Fecall(vec  &v)  =  0;  //  recall  an  output  pattern  given  an  input 

virtual  float  cycle(istream&  s); 

virtual  void  trainQ; 

int  getiters(void){retum  iters;} 

virtual  float  testQ;  //  floating  point  value  indicates  percentage  correct  of  test 
virtual  void  runQ; 


}; 
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IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIH^ 

n  VECMAT.H 

//  Vector  and  matrix  classes 

//  Author:  Capt  James  Gain^,  GEO-93D  Last  Modified;  10  Sep  93 
//Modified  from  BP.CPP  Adam  Blum  (1990) 

#include<stdlib.h> 


#include<fcntl.h> 

#include<stdio.h> 

#include<string.h> 

#include<limits.h> 

#include<ctype.h> 

#include<math.h> 

#include<time.h> 

#include<float.h> 

#ifdef_TURBOC_ 

#include<sys\stat.h> 

#include<io.h> 

#include<conio.h> 

#include<alloc.h> 

#elifdefined(_ZTC_J 

#include<dos.h> 

#else 

include  <gl/gl.h> 

#include  <gl/device.h>  //  for  button  constant  ESCKEY 
#endif 

#include<iostream.h> 

#include<fstream.h> 

#include<iomanip.h> 

#define  max(a,b)  (((a)  >  (b))  ?  (a)  ;  (b))  //  C-h-  doesnt  have  min/max 

#define  mm(a,b)  (((a)  <  (b))  ?  (a)  ;  (b)) 

#include''debug.h" 

double  logistic(double  activation); 

#ifdef_TURBOC_ 
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int  _Cdecl  intcinp(const  void*  il, const  void  *i2); 

#else 

int  intcmp(const  void*  il, const  void  *i2); 

#endif 

//  will  be  changed  to  much  higher  than  these  values 
const  ROWS=64;  //  number  of  rows  (length  of  first  pattern) 
const  COLS=^;  //  number  of  columns  (Imgth  of  second  pattern) 
const  DELAYS®  64;  //number  of  time  delays 
const  MAXVEC=64;  //  default  size  of  vectors 

class  mtrx3d; 

class  matrix; 


class  vec  { 

fiiend  ostream&  operator«(ostream&  s,vec&  vl); 
#ifdef_TURBOC_ 

friend  ostream  far&  operator«(ostream  far&  s,vec  finA  vl); 

#endif 

friend  class  matrix; 
fiiend  class  mtrx3d; 

friend  class  bp; 
fiiend  class  atnn; 

fiiend  istream&  operator»(istream&  s,vec&  vl); 
intn; 
float  *v; 
public; 

vec(int  size=MAXVEC,int  val®0);  //  constructor 

~vecO;  //  destructor 

vec(vec  &vl);  //  copy-initializer 

int  lengthQ; 

float  distance(vec&  A); 

vec&  normalizeO; 

vec&  normalizeonQ; 

vec&  randomize(float  initrange=1.0); 

vec&  scale(vec&  minvec,vec&  maxvec); 

float  dJogisticQ;  //  dot  product  of  vector  and  complement 

float  maxvalO; 

vec&  garbIe(float  noise); 

vec&  operator=(const  vec&  vl);  //  vector  assignment 
vec  operatoH-(const  vec&  vl);  //  vector  addition 
vec  operatorKconst  float  d); 

vec&  operator+=(const  vec&  vl);  //  vector  additive-assignment 
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//  supplied  for  completeness,  but  we  don't  use  this  now 

vec&  operator'''=(float  c);  //  vector  multiply  by  constant 

//  vector  transpose  multiply  needs  access  to  v  array 

int  operator==(const  vec&  vl); 

float  operator[](int  x); 

int  vec;;maxindexO; 

vec&  getstr(char  *s); 

void  putstr(char  *s); 

vec  operator-(const  vec&  vl);  //  vector  subtraction 
vec  operator-(const  float  d);  //  subtraction 
float  operator*(const  vec&  vl);  //  dot-product 
vec  operator*(float  c);  //  multiply  by  constant 
vec&  sigmoid(vec&  thresh); 
vec&  set(int  i, float  f=0); 

int  load(FILE  •f); 
int  save(FILE  "*0; 

};  //end  vector  class 

class  vecpair; 


class  matrix  { 

//  we  only  allow  access  here  to  improve  backpropagation's  performance 
friend  ostream&  operator«(ostream&  s,matTix&  ml); 
friend  istream&  operator»(istream&  s,matrix&  ml); 
protected: 

float  **m;  //  the  matrix  representation 
int  r,c;  //  number  of  rows  and  columns 
public: 

//  constructors 

matrix(int  n=ROWS,int  p=COLS,float  range=0); 

matrix(int  n,int  p,float  value,float  range); 

matrix(int  n,int  p,char  *fii); 

matiix(const  vecpair&  vp); 

mattix(vec&  vl,vec&  v2); 

matrix(matrix&  ml);  //  copy-initializer 

-matrixO; 

int  depthO; 

int  widthQ; 

matrix&  operator=(const  matrix&  ml); 
matrix&  operator=(const  vecpair&  v); 
matrix  operatorH-(const  matrix&  ml); 
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vec  opcrator*(vec&  vl); 

vec  colsIice(int  col); 

vec  rowslice(iiit  row); 

void  insertcol(vec&  v,int  col); 

void  insertrow(vec&  v.int  row); 

int  closestcol(vec&  v); 

int  closestrow(vec&  v); 

int  closestrow(vec&  v,int  *wins,float  scaling); 

int  load(FILE  •!); 

int  save(FILE  *0; 

float  getval(int  row, int  col); 

void  setval(int  row, int  col,float  val); 

void  initvals(const  vec&  vl,const  vec&  v2, 

const  float  rate=1.0,  const  float  momentum=O.0); 

matrix&  operatorH=(const  matrix&  ml); 
matrix&  operator*(const  float  d); 
matrix&  operator*=(const  float  d); 

};  //  end  matrix  class 


class  vecpair  ( 
friend  class  matrix; 

friend  istream&  operator»(istreani&  s,vecpair&  vl); 
friend  ostream&  operator«(ostream&  s,vecpair&  vl); 
friend  matrix:  :matrix(const  vecpair&  vp); 

int  flag;  //  flag  signalling  whether  encoding  succeeded 
public: 
vec  *a; 
vec  *b; 

vecpai^int  n=ROWS,int  p=COLS,int  val=0);  //  constructor 
vecpair(vec&  A,vec&  B); 
vecpair(const  vecpair&  AB);  //  copy  initializer 
-vecpairO; 

vocpairA  operatop=(const  vecpair&  vl); 

int  operator=(const  vecpair&  vl); 

vecpair&  scale(vecpair&  minvecs,vecpair&  maxvecs); 


class  mtrx3d{ 

friend  class  matnx; 

fiiend  istream&  operator»(istream&  s,mtrx3d&  m3dl); 
fiiend  ostream&  operator«(ostre8m&  s,mtrx3d&  m3dl); 
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protected: 

float  ***in3d;  //  the  3D  matrix  representation 
int  r,c,z;  //  number  of  rows  and  columns 
public: 

//  constructors 

mtrx3d(int  n=ROWS,int  p=COLS,int  m®=DELAYS,float  range=0); 
mtrx3d(int  n,int  p.int  m,int  irange); 
mtix3d(int  n,int  p.int  m,float  vaiue,float  range); 
mtTx3d(int  n,int  p.int  m.char 
mtrx3d(mtrx3d&  m3dl);  //  copy-initializer 
~mtrx3d0; 
int  depthO; 
int  widthQ; 
int  heightO; 

mtrx3d&  operator=(const  mtrx3d&  m3dl); 
mtrx3d  operatorKconst  mtrx3d&  m3dl); 


int  load(FILE  *£); 

int  save(FILE  *0; 

float  getvai(int  row.int  col.int  z); 

void  setval(int  row.int  col.int  z.float  val); 

mtrx3d&  operator^=(const  mtix3d&  m3dl); 
mtrx3d&  operator*(const  float  d); 
mtrx3d&  operator*=(const  float  d); 

};  //  end  3d  matrix  class 
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